Recovery ======== last modified: 27 Aug 1999, sct. First draft. RECOVERY TYPES -------------- In most existing cluster systems, recovery is a process initiated directly after a successful cluster transition, and that is the end of the story. That is also more or less true in a flat, peer cluster in our cluster model, but the existence of either satellite nodes or a cluster hierarchy complicates things somewhat. We need to make a clear distinction, then, between different events which can occur in a cluster or metacluster. (Refer to the definitions in hierarchy.txt: they are quite important here.) * CLUSTER TRANSITION is the event which occurs when a cluster's membership list changes. * CLUSTER RECOVERY is the recovery initiated with respect to that membership list. * PEER RECOVERY is the recovery initiated with respect to _every_ peer in the cluster's peerage. * SATELLITE RECOVERY is the recovery initiated with respect to satellites for which this node is responsible. RECOVERY DAEMON --------------- Once a cluster initiates recovery, we need to signal the various cluster daemons to recover in some given order. Obviously, the task of starting up the cluster in the first place also has to establish cluster services in some order, and the order in each case is dictated by the various dependencies between services. In other words, startup of cluster services when a new node joins a cluster is related to recovery. We can also notice that there are certain cluster services, such as the comms and barrier services, which are relied upon to provide cluster-wide synchronisation primitives. Without those primitives in place, there is no cluster-wide recovery. There is therefore dependency between recovery services on a single nodes as much as there is dependency between nodes. Therefore, we need to have a local daemon which can order services to recover in the appropriate order, even in the absense of cluster-wide synchronisation. Obviously one of the first services to be recovered should be the barrier synchronisation service so that later recovery stages can rely on that for cluster-wide synchronisation. This daemon will also be responsible for the startup of the local node's various cluster daemon processes, as those need to be started up in the same order in which we deliver recovery orders. As a secondary issue, this same master daemon will be responsible for detecting the death, or failure to respond, of any cluster service daemon and to kill and restart the entire cluster stack if that is detected. /cbin/init starts up all internal cluster components, and on failure (process dies, process fails to respond in a given timeout) will kill and restart all components from scratch. /cetc/inittab: 0:/cbin/ccomms 0:/cbin/integrate 0:/cbin/barrier 1:/cbin/nameserv 2:/cbin/confrelay 3:/cbin/quorum 4:/cbin/cdb ----- * On cluster transition start (or peerage transition): init signals each cluster component that we have begun transition. (Tag the transition with the new local transition sequence number.) * On cluster transition end: Once all components have ACKed the transition, go through each component, one by one, doing: + Send a recovery event with the new transition sequence + Wait for a recovery ACK with the right sequence number Wait for a second, recovery-complete ACK to arrive from all services. Complete the recovery barrier.