-
Quit your bellyachin'! We needed a "catch-all"
document to supply useful information in a way that was easily referenced
and would grow without a lot of work. It's closer to a FAQ than anything
else.
-
HA (High availability Cluster) - This is a cluster
that allows a host (or hosts) to become Highly Available, that means if
one node goes down (or a service on that node goes down) another node can
pick up the service or node and take over from the failed machine. http://linux-ha.org
Computing Cluster - This is what a Beowulf cluster is. It allows distributed
computing over off the shelf components. In this case it is usually cheap
IA32 machines. http://www.beowulf.org/
Load balancing clusters - This is what the Linux Virtual Server project
does. In this scenario you have one machine with load balances requests
to a certain server (apache for example) over a farm of servers. www.linuxvirtualserver.org
All of these sites have howtos etc. on them. For a general overview
on clustering under Linux, look at the Clustering HOWTO.
-
Resource scripts are basically (extended) System
V init scripts. They have to support stop, start, and status operations.
In the future we will also add support for a "monitor" operation for monitoring
services as you requested. The IPaddr script implements this new "monitor"
operation now (but heartbeat doesn't use that function of it). For more
info see Resource HOWTO.
-
Heartbeat itself was not designed for monitoring various
resources. If you need to monitor some resources (for example, availability
of WWW server) you need some third party software. Good solution is mon.
1. Get mon
http://kernel.org/software/mon/
2. Get all required modules listed. You can find them at nearest mirror
or at the CPAN archive (www.cpan.org). I am not very familiar with Perl,
so i downloaded them from CPAN archive as .tar.gz packages and installed
them usual way (perl Makefile.pl && make && make test &&
make install).
3. Mon is software for monitoring different network resources. It can
ping computers, connect to various ports, monitor WWW, MySQL etc. In case
of dysfunction of some resources it triggers some scripts.
4. Unpack mon in some directory. Best starting point is README file.
Complete documentation is in <dir>/doc, where <dir> is place where
you unpacked mon package.
5. For a fast start do following steps:
copy all subdirs found in <dir> to /usr/lib/mon
create dir /etc/mon
copy auth.cf from <dir>/etc to /etc/mon
Now, mon is prepared to work. You need to create your own mon.cf file,
where you should point to resources mon should watch and actions mon will
start in case
of dysfunction and when resources are available again. All monitoring
scripts are in /usr/lib/mon/mon.d/. At the beginning of every script you
can find explanation how to use it.
All alert scripts are placed in /usr/lib/mon/alert.d/. Those are scripts
triggered in case something went wrong. In case you are using ipvs on theirs
homepage (www.linuxvirtualserver.org) you can find scripts for adding and
removing servers from ipvs list.
-
This isn't a problem with heartbeat, but rather
is caused by various versions of net-tools. Upgrade to the most recent
version of net-tools and it will go away. You can test it with ifconfig
manually.
-
Instead of failing over many IP addresses, just fail
over one router address. On your router, do the equivalent of "route
add -net x.x.x.0/24 gw x.x.x.2", where x.x.x.2 is the cluster IP address
controlled by heartbeat. Then, make every address within x.x.x.0/24
that you wish to failover a permanent alias of lo0 on BOTH cluster nodes.
This is done via "ifconfig lo:2 x.x.x.3 netmask 255.255.255.255 -arp" etc....
-
It will work, however if anything makes your ethernet
/ IP stack fail, you will lose both connections. You definitely should
run the cables differently, depending on how important your data is...
-
Normal failback mode:
In this mode, one of the two machines is designated as the preferred
provider of a given resource group. If that machine is up, then it will
always be the provider of every resource group for which it is preferred
provider. Failovers occur when the preferred provider goes out of service,
and when it comes back (failback). This mode is required if you wish to
run an active-active configuration.
Nice failback mode:
In this mode, there is no natural affinity between a resource group
and a particular node in the cluster (haresources file notwithstanding).
Instead, there is an affinity between a resource group and whatever machine
it is currently running on. Failovers occur *only* when a machine which
is providing a service goes out of service. There is no concept of failback
in this mode. This mode minimizes service interruptions, but cannot run
an active-active configuration.
-
To make heartbeat work with ipchains, you must accept
incoming and outgoing traffic on 694 UDP port. Add something like
/sbin/ipchains -A output -i ethN -p udp -s <source_IP> -d <dest_IP>
-j ACCEPT
/sbin/ipchains -A input -i ethN -p udp -s <source_IP> -d <dest_IP>
-j ACCEPT
-
Since the default probably isn't reasonable for
most linux systems under heavy load (sorry!), here is suggestion:
Set deadtime to 60 seconds or higher
Set warntime to whatever you *want* your deadtime to be.
Run your system under heavy load for a few weeks.
Look at your logs for the longest time either system went without hearing
a heartbeat.
Set your deadtime to 1.5-2 times that amount. Set warntime to
that amount.
Continue to monitor logs for warnings about long heartbeat times.
-
It's probably a permissions problem on authkeys.
It wants it to be read only mode (400, 600 or 700). Depending on
where and when it discovers the problem, the message will wind up in different
places.
But, it tends to be in
a) stdout/stderr
b) wherever you specified
in your setup
c) /var/log/messages
-
Use multicast and give each its own multicast
group. If you need to/want to use broadcast, then run each cluster on
different port numbers.
-
There is a CVS repository for Linux-HA.
You can find it at cvs.linux-ha.org. Read only access
is via login guest, password guest, module name linux-ha. More details
are to be found in the announcement email. It is also available
through the web using viewcvs at http://cvs.linux-ha.org/viewcvs/viewcvs.cgi/linux-ha/
-
Heartbeat is currently being ported to use automake.
-
Please be sure that you read all documentation
and searched mail list archives. If you still can't find a solution you
can post questions to the mailing list. Please include following:
-
What OS are you running.
-
What version (distro/kernel).
-
How did you install heartbeat (tar.gz, rpm, src.rpm or manual installation)
.
-
Include the of your logs which describes errors. Send them as attachments.
Please don't send "cleaned up" logs. The real logs have more
information in them than cleaned up versions. Always include at least
a little irrelevant data before and after the events in question so that
we know nothing was missed. Don't edit the logs unless you really
have some super-secret high-security reason for doing so.