Cluster Design Principles
=========================
$Id: principles.txt,v 1.2 1999/12/16 21:15:25 sct Exp $

Remember the Unix philosophy:

	Do one thing, but do it well.


There are two basic principles which pervade this design:

* Modularity

  Needless to say, modular code is easier to maintain than tightly
  integrated monolithic structures.  The entire design of the
  clustering is intended to take into account the requirements of each
  module when considering other modules, but the mechanisms used to
  implement any specific module are never exposed to other modules
  directly except through specified, general purpose APIs.

  This results in certain design features which are alien to most HA
  clustering implementations.  For example, Quorum is never considered
  by the cluster integration layer.  Quorum is merely another resource
  which comes and goes in the cluster as nodes join and leave.
  Obviously it is a critically important resource, and must be the
  first resource recovered after a cluster transition, but the impact
  of quorum management on the rest of the cluster layers is minimal.

* State progression

  It is possible to produce very complex state transition diagrams
  when producing code which operates in the highly concurrent
  environment in which cluster code is expected to run.  There is a
  guiding design principle which substantially simplifies many of
  these state transitions:

 ++ All components of the system which need to construct global
    (externally influenced) state and which have to deal with error
    conditions must maintain a strict priority ordering of states.  Only
    after all neighbouring components have acknowledged transition to
    the same state are we allowed to begin controlled progression to the
    next state (ie. there is a barrier between each state progression).
    Error conditions (at least, errors which are expected to trigger
    cluster state transitions) ALWAYS trigger an immediate abort of the
    construction of the current state: we move instantly to a state
    lower in the state hierarchy on error, and resume construction of
    the higher state from there.

  This principle is obeyed in many places.  In the cluster
  communications code, any (unrecovered) communications error between
  two nodes triggers an immedate loss of link UP status, and we do not
  allow the link state to come back UP until we are sure that (a) the
  other endpoint has also left UP state, and (b) all other
  communication channels between the two nodes have also been purged
  of messages from the old UP state.  In the cluster integration
  layers, we have various stages we must go through to build the new
  cluster: discovery, election, verification and commit.  Any error in
  any of these stages triggers an immediate drop to a previous stage.

  The important property to obey here is that whenever we fall back to
  such a lower state, we must have a mechanism in place which ensures
  that all of our neighbours will also return to that state before
  continuing: it is necessary to reestablish agreement with our
  neighbours that that has occurred before we can start to progress
  the state machine again.