Faq'n Tips

Faq'n Tips

Hey! This doesn't look like a FAQ! What gives?
What is a cluster?
What is a resource script ?
How to monitor various resources ?
Every time my machine releases an IP alias, it loses the whole interface (i.e. eth0)! How do I fix this?
I want a lot of IP addresses as resources (more than 8). What's the best way?
The documentation indicates that a serial line is mandatory, but when I comment it out from the config file and use use only two ethernet connections it seems to work fine.
What is a difference between normal and nice failback ?
How to use heartbeat with ipchains firewall ?
How to tune heartbeat on heavily loaded system ?
When I try to start heartbeat i receive message:

"Starting High-Availability services: Heartbeat failure [rc=1]. Failed.

and there is nothing in any of the log files and do messages. What is wrong ?

How to run multiple clusters on same network segment ?
How to get latest CVS version of heartbeat ?
Heartbeat on other OSs.
If nothing helps, what should I do ?

Quit your bellyachin'! We needed a "catch-all" document to supply useful information in a way that was easily referenced and would grow without a lot of work. It's closer to a FAQ than anything else.
HA (High availability Cluster) - This is a cluster that allows a host (or hosts) to become Highly Available, that means if one node goes down (or a service on that node goes down) another node can pick up the service or node and take over from the failed machine. http://linux-ha.org

http://www.beowulf.org/

www.linuxvirtualserver.org

Resource scripts are basically (extended) System V init scripts. They have to support stop, start, and status operations. In the future we will also add support for a "monitor" operation for monitoring services as you requested. The IPaddr script implements this new "monitor" operation now (but heartbeat doesn't use that function of it). For more info see Resource HOWTO.
Heartbeat itself was not designed for monitoring various resources. If you need to monitor some resources (for example, availability of WWW server) you need some third party software. Good solution is mon.

2. Get all required modules listed. You can find them at nearest mirror or at the CPAN archive (www.cpan.org). I am not very familiar with Perl, so i downloaded them from CPAN archive as .tar.gz packages and installed them usual way (perl Makefile.pl && make && make test && make install).

3. Mon is software for monitoring different network resources. It can ping computers, connect to various ports, monitor WWW, MySQL etc. In case of dysfunction of some resources it triggers some scripts.

4. Unpack mon in some directory. Best starting point is README file. Complete documentation is in <dir>/doc, where <dir> is place where you unpacked mon package.

5. For a fast start do following steps:
copy all subdirs found in <dir> to /usr/lib/mon
create dir /etc/mon
copy auth.cf from <dir>/etc to /etc/mon

Now, mon is prepared to work. You need to create your own mon.cf file, where you should point to resources mon should watch and actions mon will start in case
of dysfunction and when resources are available again. All monitoring scripts are in /usr/lib/mon/mon.d/. At the beginning of every script you can find explanation how to use it.
All alert scripts are placed in /usr/lib/mon/alert.d/. Those are scripts triggered in case something went wrong. In case you are using ipvs on theirs homepage (www.linuxvirtualserver.org) you can find scripts for adding and removing servers from ipvs list.

This isn't a problem with heartbeat, but rather is caused by various versions of net-tools. Upgrade to the most recent version of net-tools and it will go away. You can test it with ifconfig manually.
Instead of failing over many IP addresses, just fail over one router address. On your router, do the equivalent of "route add -net x.x.x.0/24 gw x.x.x.2", where x.x.x.2 is the cluster IP address controlled by heartbeat. Then, make every address within x.x.x.0/24 that you wish to failover a permanent alias of lo0 on BOTH cluster nodes. This is done via "ifconfig lo:2 x.x.x.3 netmask 255.255.255.255 -arp" etc....
It will work, however if anything makes your ethernet / IP stack fail, you will lose both connections. You definitely should run the cables differently, depending on how important your data is...
Normal failback mode:

To make heartbeat work with ipchains, you must accept incoming and outgoing traffic on 694 UDP port. Add something like

Since the default probably isn't reasonable for most linux systems under heavy load (sorry!), here is suggestion:

It's probably a permissions problem on authkeys. It wants it to be read only mode (400, 600 or 700). Depending on where and when it discovers the problem, the message will wind up in different places.

Use multicast and give each its own multicast group. If you need to/want to use broadcast, then run each cluster on different port numbers.
There is a CVS repository for Linux-HA. You can find it at cvs.linux-ha.org. Read only access is via login guest, password guest, module name linux-ha. More details are to be found in the announcement email. It is also available through the web using viewcvs at http://cvs.linux-ha.org/viewcvs/viewcvs.cgi/linux-ha/
Heartbeat is currently being ported to use automake.
Please be sure that you read all documentation and searched mail list archives. If you still can't find a solution you can post questions to the mailing list. Please include following:

What OS are you running.
What version (distro/kernel).
How did you install heartbeat (tar.gz, rpm, src.rpm or manual installation) .
Include the of your logs which describes errors. Send them as attachments.

Rev 0.0.5
(c) 2000 Rudy Pawul rpawul@iso-ne.com
(c) 2001 Dusan Djordjevic dj.dule@linux.org.yu