Linux pacemaker – prevent split brain

linuxpacemaker

since using CentOS 7, we switched from a regular heartbeat setup to pacemker.

Mainly we have IP resources that are active on one node and switch to a second node if a failover happens. Also we execute some scripts in case of a failover. Nothing special.

For resources to always start on the primary node, i use

pcs constraint location Cluster_IP prefers server1=master-server

I also use

pcs resource defaults resource-stickiness=INFINITY

to prevent resources moving back after failover.

This works fine for me, if the master fails (hardware failure for example).

Since its not a problem for me if the failover takes some time, i would like to implement some kind of delay in case of a short split brain.

Before doing anything, the slave should wait ~2 Minutes, before it takes over, in case the master is reachable again in this ~2 Minutes.

I was wondering, what would be the best way to do it?

Best Answer

I've never set a token timeout in Corosync to anything above 10 seconds, but you could try increasing/setting the token value in your corosync.conf to 120000 (120 seconds in milliseconds). token should be defined in the totem{} section of your corosync.conf; man corosync.conf for more detail.

That should prevent Corosync from declaring a node dead for 120s when the network flakes out.

Related Topic