Recovery
========

last modified:
27 Aug 1999, sct.  First draft.


RECOVERY TYPES
--------------

In most existing cluster systems, recovery is a process initiated
directly after a successful cluster transition, and that is the end of
the story.  That is also more or less true in a flat, peer cluster in
our cluster model, but the existence of either satellite nodes or a
cluster hierarchy complicates things somewhat.

We need to make a clear distinction, then, between different events
which can occur in a cluster or metacluster.  (Refer to the
definitions in hierarchy.txt: they are quite important here.)

* CLUSTER TRANSITION is the event which occurs when a cluster's
  membership list changes.

* CLUSTER RECOVERY is the recovery initiated with respect to that
  membership list.

* PEER RECOVERY is the recovery initiated with respect to _every_ peer
  in the cluster's peerage.

* SATELLITE RECOVERY is the recovery initiated with respect to
  satellites for which this node is responsible.


RECOVERY DAEMON
---------------

Once a cluster initiates recovery, we need to signal the various
cluster daemons to recover in some given order.  Obviously, the task
of starting up the cluster in the first place also has to establish
cluster services in some order, and the order in each case is dictated
by the various dependencies between services.  In other words, startup
of cluster services when a new node joins a cluster is related to
recovery.  

We can also notice that there are certain cluster services, such as
the comms and barrier services, which are relied upon to provide
cluster-wide synchronisation primitives.  Without those primitives in
place, there is no cluster-wide recovery.  There is therefore
dependency between recovery services on a single nodes as much as
there is dependency between nodes.  

Therefore, we need to have a local daemon which can order services to
recover in the appropriate order, even in the absense of cluster-wide
synchronisation.  Obviously one of the first services to be recovered
should be the barrier synchronisation service so that later recovery
stages can rely on that for cluster-wide synchronisation.  

This daemon will also be responsible for the startup of the local
node's various cluster daemon processes, as those need to be started
up in the same order in which we deliver recovery orders.  As a
secondary issue, this same master daemon will be responsible for
detecting the death, or failure to respond, of any cluster service
daemon and to kill and restart the entire cluster stack if that is
detected.


/cbin/init starts up all internal cluster components, and on failure
(process dies, process fails to respond in a given timeout) will kill
and restart all components from scratch.

/cetc/inittab:

0:/cbin/ccomms
0:/cbin/integrate
0:/cbin/barrier

1:/cbin/nameserv
2:/cbin/confrelay
3:/cbin/quorum
4:/cbin/cdb

-----

* On cluster transition start (or peerage transition):

  init signals each cluster component that we have begun transition.
  (Tag the transition with the new local transition sequence number.)

* On cluster transition end:

  Once all components have ACKed the transition, go through each
  component, one by one, doing:
  + Send a recovery event with the new transition sequence
  + Wait for a recovery ACK with the right sequence number

  Wait for a second, recovery-complete ACK to arrive from all
  services.

  Complete the recovery barrier.