Presenter/Topics

9:15-9:30 Welcome 

Mike Wangsmo, symposium organizer, stated his motive: to get 
persons who have demonstrated interest and expertise in Linux High 
Availability/Clustering together in one room, so that they may begin to
design and develop a GPL infrastructure for a comprehensive Linux solution.

9:30-noon Core Services 

Stephen Tweedie presented his understanding of where we are now, where we need
to go, and how to get there.

Currently there is a great deal of HA/Cluster development activity, but it is
fragmented and focused on short-term goals -- on rapidly developing products or
product components that solve the easy problems. If we continue in this way,
we will end up with a disparate collection if incompatible, partial solutions.

We need to start now designing and building a framework capable of solving the
hard as well as the easy problems. This is doable, but it requires a long-term
view.   

The hardest problem is cluster partitioning, the situation whereby a node 
becomes isolated independently updates a database that another node is also
independently updating. This is a problem equally of shared-everything and
shared-nothing approaches. To prevent partitioning, we need to be able to
guarantee that every cluster node is informed when a node fails, and we need
a technique to prevent surviving cluster nodes from acting independently.
Quorum is such a technique -- cluster nodes vote, and none may act unless it
has a majority. To break ties, the disk (or another device) may be given an
extra vote. We need an API that provides Quorum as an independent cluster
service.

Another hard problem, scalability, can be solved by a hierarchical design,
which will work for Massive Parallel Processing (MPP) as well as for
heterogeneous/HA clusters. In this design, clusters of nodes are arranged in
a hierarchy, with each cluster having a leader. In this arrangement, it is
possible for any node among thousands of nodes to be informed quickly of the
failure of any other node. Also, the arrangement allows for a cluster transition
to be isolated to cluster peers at a level (usually a localized geographical
area) -- no need to involve other levels in a recovery, or in the re-election
of a new peer leader.    

A number of objections were raised, the largest among them being the motive for
such an arrangement -- what's the advantage of, say, binding a Scotland
cluster into a UK cluster and the UK cluster into Red Hat cluster, over
just leaving them as independent clusters?  Answer: with distributed
heterogeneous clusters, ease of administration, and administrative
flexibility. With MPP clusters, a hierarchical arrangement makes node
recovery possible. (It was not clear whether these answers were acceptable to
all present.)


1:00-4:00 Storage and DLM

Peter Braam


4:00-5:00 Open discussion