So what happens when we start the clustering software? Look at: * Dependencies and ordering requirements * Coping with services which haven't achieved cluster-wide integrity yet * Synchronising with startup/resume of other services Startup sequence includes: * First thing: start /cbin/init. This manages the controlled startup of local cluster daemons. Level 0 daemons are started in the order they are specified in the control file, and if any die, the entire set is killed and restarted. * Start cluster comms. The cluster communication mechanism starts to broadcast for neighbouring nodes, and establishes point-to-point links with those it finds, but does not send any data. It generates node-found events which can be intercepted by other daemons. * Start integration layer. Create an initial cluster of one node in transition state. The normal cluster integration protocol will do cluster breakup (noop --- no neighbours so we still have contact with all neighbours!), then cluster merge. Query the cluster comms for a list of known nodes and monitor new node-found events. Once the new-cluster transition timeout has elapsed with no new transitions, it will try to connect to the barrier server and generate a cluster barrier reset on "RECOVERY". * Start cluster barrier server. On a cluster transition, close all client connections. At startup, we start off with none anyway, of course. Once the integration layer says we have a new cluster, check if we are the CC. If so, we set ourselves up as the cluster-wide barrier master, else we are a barrier slave. Either way we start out with no barriers after the initial startup. Barrier slaves propogate all of their persistent barriers to the barrier master. (Persistent barriers are application barriers which have not requested recovery knowledge.) The master rebuilds its barrier database and sends any updates to any slaves which seem to need it. Any non-persistent barriers are destroyed. A new "RECOVERY" system-privileged barrier is created. Once the barrier state is rebuilt, the barrier master opens up for new business and instructs all slaves to do likewise. First thing is to tell clients about destroyed barriers, of course. At some point we will now get round to the cluster recovery barrier request. *** Need a separate "CINIT" barrier for /cbin/init to register when *** all local clients have been started and have registered with the *** barrier services. Either that, or have the CC communicate with *** the barrier service to prevent the initial RECOVERY:0 barrier *** transition until all cluster members have registered the *** RECOVERY barrier.