Cluster Design Principles ========================= $Id: principles.txt,v 1.2 1999/12/16 21:15:25 sct Exp $ Remember the Unix philosophy: Do one thing, but do it well. There are two basic principles which pervade this design: * Modularity Needless to say, modular code is easier to maintain than tightly integrated monolithic structures. The entire design of the clustering is intended to take into account the requirements of each module when considering other modules, but the mechanisms used to implement any specific module are never exposed to other modules directly except through specified, general purpose APIs. This results in certain design features which are alien to most HA clustering implementations. For example, Quorum is never considered by the cluster integration layer. Quorum is merely another resource which comes and goes in the cluster as nodes join and leave. Obviously it is a critically important resource, and must be the first resource recovered after a cluster transition, but the impact of quorum management on the rest of the cluster layers is minimal. * State progression It is possible to produce very complex state transition diagrams when producing code which operates in the highly concurrent environment in which cluster code is expected to run. There is a guiding design principle which substantially simplifies many of these state transitions: ++ All components of the system which need to construct global (externally influenced) state and which have to deal with error conditions must maintain a strict priority ordering of states. Only after all neighbouring components have acknowledged transition to the same state are we allowed to begin controlled progression to the next state (ie. there is a barrier between each state progression). Error conditions (at least, errors which are expected to trigger cluster state transitions) ALWAYS trigger an immediate abort of the construction of the current state: we move instantly to a state lower in the state hierarchy on error, and resume construction of the higher state from there. This principle is obeyed in many places. In the cluster communications code, any (unrecovered) communications error between two nodes triggers an immedate loss of link UP status, and we do not allow the link state to come back UP until we are sure that (a) the other endpoint has also left UP state, and (b) all other communication channels between the two nodes have also been purged of messages from the old UP state. In the cluster integration layers, we have various stages we must go through to build the new cluster: discovery, election, verification and commit. Any error in any of these stages triggers an immediate drop to a previous stage. The important property to obey here is that whenever we fall back to such a lower state, we must have a mechanism in place which ensures that all of our neighbours will also return to that state before continuing: it is necessary to reestablish agreement with our neighbours that that has occurred before we can start to progress the state machine again.