Cluster design documents ------------------------ $Id: 00README.txt,v 1.2 1999/12/16 21:15:22 sct Exp $ OVERVIEW OF THESE DOCUMENTS This set of documents is not a comprehensive design spec for clustering. Rather, it is a set of miscellaneous documents including both discussion documents and work-in-progress design drafts, not for a whole clustering system, but for a core set of APIs intended to provide a comprehensive and robust infrastructure on top of which true clustering services can be layered. So, you won't find any proposals for IP takeover or for clustered filesystems here. You _will_ find proposals for APIs which will let the IP failover manager communicate the state of the running IP interfaces to the rest of the cluster, or to allow other services to be started and stopped as appropriate if an IP address is migrated from one node to another. The motivation for this work is primarily that although there are many distinct clustering projects under way for Linux, there is no general-purpose framework to provide solutions for some of the hard problems such as quorum management, massive scalability and management frameworks. A successful outcome would be a set of core cluster APIs which are both simple enough that arbitrary other (existing or future) cluster services can take advantage of them easily; and powerful enough that there is real benefit to be had from using them. In this directory you will find the following documents: goals.txt: Outlines in a little more detail some of the goals of the work envisaged. principles.txt: A few design principles and justifications applicable to the whole project. structure.txt: Describes the component layers necessary to achieve the initial objectives. hierarchy.txt: Describes some of the implications of having hierarchical clusters. Then we have documents describing individual components in some detail: api.txt: General requirements for the cluster APIs, including at least some details of inter-process communication between cluster service processes on a single node. recovery.txt: A manager for the local cluster processes on a node. This must deal with both the initial startup of processes, and the coordinated restart after a cluster transition. communications.txt: The cluster communications layer: getting nodes to talk to each other. integration.txt: The cluster integration layer: binding nodes into a coherent cluster. discovery.txt: The discovery algorithm used for cluster reforming in the integration layer. barrier.txt: The barrier API used for cluster-wide synchronisation of arbitrary services. quorum.txt: Quorum management: how to tell if it is safe to access shared cluster data. NEW.txt: Stuff still to be integrated into the above