Namespace Manager
=================

The "Cluster Namespace" is a simple concept: it is a cluster-wide table
of "NAME=VALUE" pairs, much like the environment variables of a standard
Unix process.  The namespace is a dynmic table: names do not survive a
cluster reboot.  Each name in the table is owned by a process in the
cluster, and will be removed if that process dies or its node leaves the
cluster.

The namespace intended to provide an API to query and locate all
services around a cluster.  It is not a service management API, in that
it is not intended to provide a general interface for interaction with
services.  

It is, however, intended to be the single clearing-house through which
all queries to locate a service are directed.  For example, all printers
queue servers in a cluster may register their printers in the cluster
namespace under the name "PRINTER/<printer-name>=<printer-type>".  Any
user can query for all "PRINTER/*" names to find the printers in the
cluster, and the query reply will include the cluster node for each
printer returned.  It would also be possible for a printer, having been
registered, to export extra information about itself such as
"PRINTER/<name>/STATUS=idle" if it wished.  Similarly, exported NFS
directories, exported network block devices and so on can all be
registered with such a namespace.

The fact that the namespace API knows about individual processes is key
to making this work.  An application export a name, in such a way that
the name disappears if the application dies, but it is also possible for
another application to query that name and set up an active dependency
on it.  In this case, the dependent application will receive an
asynchronous notification from the namespace service if anything happens
to the name in question, for example if its value changes, or it dies
altogether, or its host moves from one node to another due to failover.

There is another critical property of the namespace layer: a namespace
registration may be made either shared or exclusive.  A shared name
assignment simply means that multiple instances of the name may be
present.  For example, in the printer case, any number of hosts might
offer a printer named "PRINTER/DEFAULT", and a print request to the
default printer may appear on any of those printers.

An exclusive name assignment will only be granted to one process in the
cluster at once.  However, that does not mean that only one node can
request the name.  If two or more processes request assignment of the
same name exclusively, then the first will be granted the name, and the
others can stall until the name becomes available.  

This provides a flexible mechanism for managing failover.  A service can
try register the same name on each node, and the namespace will ensure
that it is granted only on node node.  The request can include a
preference value, in which case the node with the highest preference for
that name will be granted it.  However, if that node dies, the existing
queued request for the name on another node will be granted, and the
service on that node will be able to continue.

Any other services, on that node or on any other, which were dependent
on the old name can request a callback so that if such failover occurs,
they can deal with the change in service, so client requirements for
failover are manageable as well as server requirements.

In practice, I expect that there will be a local failover service on
each node which uses a simple scripting configuration to allow the user
to set up failover groups of multiple services started in a particular
order.  In such cases, the use of an exclusive name "FAILGROUP/<name>"
can be used to make the failover of each failover group atomic around
the cluster.