Linux-HA Heartbeat management

This page documents the Linux-HA heartbeat management process as implemented by the "heartbeat" program.  In this design, the heartbeat communication protocol is used to verify that systems are up, and to coordinate system configuration changes across the cluster.  This document describes a very general method which will support integration of the heartbeat process together with recovery and reconfiguration actions like IP address takeover. 

Note:  This page is somewhat out of date, but the message formats are still useful.

Linux-HA Heartbeat message format

This note documents the format of Linux-HA heartbeat messages. Heartbeat messages begin with the string ">>>\n", and end with the string "<<<\n".  In between these delimiters, are name value pairs in this format: "name=value\n".  There are no conventions for encoding newlines or NULLs into field values.  Certain fields are present in every message.  Some fields have very short names for sending on the wire, but are given longer, more readable names.

Common Heartbeat Message fields

The first several fields of a heartbeat message are as descibed below.
 
Short Name
Long Name
Description
t type The type of the message.
src node The node originating this message
ts nodetime This field is the time in the originating node when it was sent.  It is a UNIX epoch timestamp, in hexadecimal.
seq - The sequence number of the message, in hexadecimal.
ld loadavg The load average as taken from /proc/loadavg.  We currently send the entire line.
ttl - The time to live for this packet.
dest dest If this packet is intended for only node in the cluster, this field will be present and will indicate which node this message is intended for. This is an optional field.
- hbtime The time at which the current message was received.  Note that, by definition,this field never appears in a message sent over the wire.  It is a pseudo-field added by the receiving system

"status" (T_STATUS) message fields

The status message is the central message of the HA heartbeat architecture.  Each node in the cluster has only one status at any given time, and the heartbeat subsystem tracks the node status specially.  It will set the status of a node to dead if it doesn't receive any status message from a given node within the allowed interval.  Status messages have the following additional fields:
 
Short Name
Long Name
Description
st status The status of the reporting node.  This is currently either "dead", "up", or "active". When a system first comes up, it sets its status to "up". Once it knows its communication services are fully working, the status is set to "active". Status messages for "dead" nodes are artifically created by the system which detects the node as being dead. The message is made to appear as though it were coming from the dead machine ;-).
info reason The reason why a node is marked dead.  This is an optional field.  It currently appears only when the status is set to dead.

API clients normally only see T_STATUS messages for changes in status. The millions of boring "the status is the same" messages which heartbeat handles are filtered out and not presented to API clients.

"ip-request" message fields

The "ip-request" message is issued whenever a node wants to take over resources owned by another node.  This occurs whenever a node comes back online, and wants to get it's resources from the backup resource owner.  The receiving node (the one that has the resource), sends an ip-request-response message in return.
 
Short Name
Long Name
Description
ipaddr ipaddr The name of the resource group being requested.  Note that this is not restricted to being an IP address, unlike what you might think from reading the name.

"ip-request-resp" message fields

The "ip-request-resp" message is issued in response to an "ip-request" message by the node which owned the resources being requested in the "ip-request" message.
 
Short Name
Long Name
Description
ipaddr ipaddr The name of the resource group being requested.  Note that this is not restricted to being an IP address, unlike what you might think from reading the name.
ok ok This field has the value OK if the requested resource has been given up.  The code currently assumes that the machine currently owning the resource will always give it up, hence this field is currently always set to OK.

 

"rexmit-request" message fields

The "rexmit-request" message is issued to request the retransmission of missing packets.  It always has a "dest" field, since the request only makes sense for one machine.
 
Short Name
Long Name
Description
firstseq firstseq The lowest missing sequence number that corresponds to a missing packet.
lastseq lastseq The highest missing sequence number.

T_SHUTDONE message fields

T_SHUTDONE messages are set when heartbeat has gracefully shut down and relinquished all its resources. T_SHUTDONE has not unique or modifier fields.

T_IFSTATUS message fields

Note: I don't think this message ever hits the wire(?). An IFSTATUS message is sent when the status of an interface changes. If it was up and is now down, or was down and is now up an IFSTATUS message is sent.
 
Short Name
Long Name
Description
ststatus The status of the link as far as we can tell: "dead" (means no packets received on this link recently), or "up" meaning we can hear packets on this link. This field only relates to ability to receive packets, not to send them.

T_APIREQ message fields

T_APIREQ messages are messages from heartbeat clients to heartbeat. Each T_APIREQ message represents an API request call. Note: this message never hits the wire. It only comes from local clients to heartbeat. There are many different API requests implemented (see hb_api_core.h).
 
Short Name
Long Name
Description
reqtype reqtype The type of the API request (a message subtype). Possible values include: signon, signoff, setfilter, setsignal, nodelist, nodestatus, iflist, ifstatus.
from_id from_id Originating client identification
to_id to_id Destination client identification
pid pid Client process ID
fmask fmask Filter mask for setfilter API request
signal signal signal for setsignal API request

T_APIRESP message fields

T_APIRESP messages are messages from heartbeat to its clients in response to T_APIREQ messages which they sent. Each T_APIREQ message represents an API request call. Note: this message never hits the wire. It only goes from heartbeat to local clients.
 
Short Name
Long Name
Description
result result (F_APIRESULT) API result code. Possible values are "OK", "fail", "badreq" and "ok/more".
node node node for nodelist API response
nodelist-end nodelist-end End of nodelist API response field.
ifname ifname interface for iflist API response
iflist-end iflist-end End of interface list API response field.

T_STONITH message fields

This message is sent to indicate that a STONITH operation completed. It goes out on the wire, even though there's no one except us to hear it ;-).
 
Short Name
Long Name
Description
nodenode The name of the node which was reset
resultresult "OK" if the Stonith operation succeeded, or "bad" if it didn't.

T_STARTING message fields

This message is used by nice_failback to advise the cluster that this node is starting. Thus we can recognize a rejoin/join and do the right thing (TM). Upon the receiving of such message, the cluster triggers a handshaking protocol which ensures that someone will hold all resources. If nice_failback is not enabled, this message will never be issued. T_STARTING messages have no unique (or modifier) fields.

T_RESOURCES message fields

This message is used by nice_failback to declare what resources are currently being supported by the node. If nice_failback is not enabled, this message will never be issued.
This model assumes that every resource group on the machine is either on the nod it "belongs" on, or on the other node. This model doesn't allow for resource groups which are declared to be on a node "n" to be some on node "n" and some on the other node.
 
Short Name
Long Name
Description
rsc_holdrsc_hold The rsc_hold field declares what resources the current node holds. The possible values of the rsc_hold field in the message are:
NO_RESOURCESNO_RSC0
LOCAL_RESOURCESLOCAL_RSC1
FOREIGN_RESOURCESFOREIGN_RSC2
ALL_RESOURCESALL_RSC3
isstableisstable TRUE/FALSE falue for whether we consider ourselves stable or are in the middle of negotiating resources, etc.

Please send comments on this document to me <alanr@unix.sh> or to the linux-ha mailing list.