The Discovery Algorithm
=======================

A few notes about the discovery algorithm used in cluster breakup to
determine the best fully-connected subset of surviving machines for
forming the new cluster.

We have a value function, and we want to achieve three things:

* Decide on which nodes has the best value function; 

* Propagate all connectivity maps to that node; and

* Propagate that node's cluster reconfiguration decision to the rest
  of the nodes, even those which are only indirectly connected to it.


Consider the network of nodes/value functions:

Example D1:
    Node:	A-------B-------C-------D-------E
    Value:	4	1	3	2	5

The nodes B and D both see themselves as local minima for the value
function and both see a local maximum nearby.  How do we propagate the
appropriate value functions across the network, without using an
algorithm which introduces too much traffic in a still-fully-connected
network?

Whatever the algorithm, we need to have a termination condition.  We
decide to terminate once a node is able to determine that it is the
Elector, ie. that it has the best value function anywhere in the
graph.

We can only make this decision at any node once that node:

* Posesses the value functions for every point in the graph, and

* Is sure that it has seen the entire graph.  To know that, it must
  have seen the connectivity maps for every node in the network, and
  must have checked that every neighbour mentioned in each of those
  maps has also provided a map.

Every node knows, at all times, the state of each of its Links to
neighbours and the names of those neighbours, so no special
communications step is needed at a node to determine the node's own
connectivity map.

Assume that we start the discovery algorithm by, on each node, telling
each of our neighbours what our value function is.  Each node then
sends its local connectivity map to the neighbour with the best value
function.  In a fully connected cluster, the value exchange is O(N**2)
and the map exchange is O(N).  That's OK for now --- the RESET traffic
when we do a cluster abort is O(N**2) anyway, and the value packets
are small, so this is only a small incremental cost over the RESET
traffic.  

Now, we want to prevent any further unnecessary traffic in the common
cases where the network is largely connected.  We decide that only
nodes which are best-so-far --- nodes whose connectivity maps indicate
that it has the best value of any nodes seen so far --- may progress
the discovery algorithm further.

They do so by identifying "edge" nodes in the transitive closure of
the locally-held connectivity maps --- nodes which are mentioned as
neighbours in the known connectivity maps but whose own maps are not
yet known.

A best-so-far node proceeds by sending out routed "ping" messages to
those edge nodes, using the already-known connectivity maps to provide
the route.  The ping messages include the sender's value function and
the list of nodes which the message must pass through on the way to
the edge node.

On receiving such a packet, an edge node (identified because we have
reached the last hop in the route) sends back its local connectivity
graph (plus any other connectivity maps it may happen to have).  Every
other node on the route just increments the hop count in the packet.
We maintain the full route in the packet when we pass it, to allow the
reply also to be routed.

For every such ping packet, each node updates its local copy of the
"highest value seen" value function by maximising it with the sender
of the ping.  Any ping with a lower value function is dropped.  As a
result, if two locally-maximal nodes (A and E in example D1 above)
both start the flood-fill edge expansion algorithm, intermediate nodes
(like C above) will eventually detect the "better" of the two nodes
(E), and will stop A's ping packets from propogating further.  

Each cycle of edge expansion proceeds by the best-so-far node pinging
all edge nodes and waiting for all responses.  The ping dropping
described in the previous paragraph ensures that this process gets
terminated whenever any node detects that some locally-maximal
best-so-far node is actually not the best node in the cluster.

The entire procedure terminates once a best-so-far node has no more
edge nodes, and by this time we know for sure that it has found the
entire connectivity graph.  Other locally-maximal nodes (like A above)
may still be waiting for ping responses, but when the newly-determined
Elector starts the next phase of the cluster transition, those nodes
will break out of that state in response to a reconfigure command from
the Elector.