In order to create some kind of a priority order out of these tasks,
I've divided them into three not-altogether-arbitrary categories.
|Backup and Recovery||FAKE, heartbeat||Not integrated, and not enough. The core cluster management subsystem is missing. Without a journalling filesystem, we can't have shared disks in a practical sense. CODA might be enough for the short term.|
|Configurability||bazaar :-)||I expect that Linux could eventually lead the pack here, because of the nature of the development model, and the fact that Linux runs on many hardware platforms. I expect people to implement HA systems consisting of two PCs with a couple of serial cables. Really need a good resource model.|
|HA Administration||Customized scripts (bash, Perl, etc.)||This is actually pretty much the same as saying that we don't have any, but our source is open. I think this is pretty important.|
|Hardware and Software RAID||SCSI-based RAID, and the md driver.||Large-scale hardware RAID solutions tend to be vendor-specific. Also, Linux will have CODA which is analagous to a RAID facility. We're actually in reasonable shape here, but I think we lack integration with large-scale RAID devices.|
|In-system Failure Recovery||ifconfig, FAKE||We have some basic tools to allow this to take place, but no management infrastructure to decide what to do and when to do it. We have little or nothing to allow us to fail over disks to other controllers. Device drivers could be a help with this item.|
|In-System failure avoidance||lm78 voltage and temperature monitors||Missing basic infrastructure to hook it into.|
|In-System Service Processor Features||watchdog driver and hardware devices||Not enough for some cases. Should hook into the heartbeat driver. (oooh!)|
|Single-System Image||CODA and NIS automount maps do this to some extent||The usual meaning of this not-very-well-defined phrase would normally include things more than having the filesystem maps look nearly the same. I'm confused about everything that this means.|
|Disaster recovery||CODA could be a help here. Multicasting. Routing code (?)||Need multicast or other specialized heartbeat systems, along with fancy routing technologies. This sounds like a big deal.|
As I read this, the thing that is critically missing is the centerpiece
of the HA system -- the core HA system manger. It is the item to
which all the other pieces connect.