Linux – Importance of ha.cf file in a heartbeat/pacemaker environment

best practicesheartbeatlinuxpacemaker

I'm having a few issues trying to understand ha.cf and how the cluster picks up on updates.

For example, when creating a new cluster, I usually:

  1. Set some default options in ha.cf on node 1 – node x
  2. Start the cluster.
  3. Run crm on any node, configure resources.

Whilst I usually do nodes up/down, resources up/down, I have never actually added a new node at a later date.

Just for "fun", I decided to run a new server that only specified one node in the cluster in it's ha.cf, and then start heartbeat.

This machine successfully joined the cluster and added itself to every other node in the cluster…. Where I get confused is that even if I shutdown all nodes, and reboot the original 2 nodes, they both still have the third server as in the cluster but offline, despite the third not being in the original 2 node's ha.cf file.

Even if I edit ha.cf and change some nonsense value/or touch the file, reboot the server and cluster, it is still there. So my conclusion is that CIB takes preference over ha.cf, but, what I don't get is why/how.

I'm really looking for best practices – should any machine just have enough in ha.cf to "get it up", then do everythign in CRM? Is ha.cf a waste of time, or should I be using it a lot more?

Trying not to be so vague – I'm really just looking for what I should be doing in CRM, and what I should be doing in ha.cf?

Thanks,

Wil

Best Answer

I was really hoping to see a good answer myself.

All I can really do is endorse your experiences: that the only real function of heartbeat in these circumstances is to start pacemakerd, the CRM subsystem. This (as you know ) maintains its own database of nodes and state, which on my systems is /var/lib/heartbeat/crm/cib.xml. The files in /etc/ha.d inform heartbeat, but not crm.

I am running a number of failover pairs doing various things, most of which have been up for over 500 days and some of which are close to 1000 days, and most of which have survived any number of failovers and failbacks; so I can only assume I'm doing something right. My practice is not to actually lie in ha.cf, but to put almost nothing in there other than what is required to get HA to start up CRM.

I'm sorry I don't have anything more concrete to point you at.

Related Topic