Namespace Manager ================= The "Cluster Namespace" is a simple concept: it is a cluster-wide table of "NAME=VALUE" pairs, much like the environment variables of a standard Unix process. The namespace is a dynmic table: names do not survive a cluster reboot. Each name in the table is owned by a process in the cluster, and will be removed if that process dies or its node leaves the cluster. The namespace intended to provide an API to query and locate all services around a cluster. It is not a service management API, in that it is not intended to provide a general interface for interaction with services. It is, however, intended to be the single clearing-house through which all queries to locate a service are directed. For example, all printers queue servers in a cluster may register their printers in the cluster namespace under the name "PRINTER/=". Any user can query for all "PRINTER/*" names to find the printers in the cluster, and the query reply will include the cluster node for each printer returned. It would also be possible for a printer, having been registered, to export extra information about itself such as "PRINTER//STATUS=idle" if it wished. Similarly, exported NFS directories, exported network block devices and so on can all be registered with such a namespace. The fact that the namespace API knows about individual processes is key to making this work. An application export a name, in such a way that the name disappears if the application dies, but it is also possible for another application to query that name and set up an active dependency on it. In this case, the dependent application will receive an asynchronous notification from the namespace service if anything happens to the name in question, for example if its value changes, or it dies altogether, or its host moves from one node to another due to failover. There is another critical property of the namespace layer: a namespace registration may be made either shared or exclusive. A shared name assignment simply means that multiple instances of the name may be present. For example, in the printer case, any number of hosts might offer a printer named "PRINTER/DEFAULT", and a print request to the default printer may appear on any of those printers. An exclusive name assignment will only be granted to one process in the cluster at once. However, that does not mean that only one node can request the name. If two or more processes request assignment of the same name exclusively, then the first will be granted the name, and the others can stall until the name becomes available. This provides a flexible mechanism for managing failover. A service can try register the same name on each node, and the namespace will ensure that it is granted only on node node. The request can include a preference value, in which case the node with the highest preference for that name will be granted it. However, if that node dies, the existing queued request for the name on another node will be granted, and the service on that node will be able to continue. Any other services, on that node or on any other, which were dependent on the old name can request a callback so that if such failover occurs, they can deal with the change in service, so client requirements for failover are manageable as well as server requirements. In practice, I expect that there will be a local failover service on each node which uses a simple scripting configuration to allow the user to set up failover groups of multiple services started in a particular order. In such cases, the use of an exclusive name "FAILGROUP/" can be used to make the failover of each failover group atomic around the cluster.