since using CentOS 7, we switched from a regular heartbeat setup to pacemker.
Mainly we have IP resources that are active on one node and switch to a second node if a failover happens. Also we execute some scripts in case of a failover. Nothing special.
For resources to always start on the primary node, i use
pcs constraint location Cluster_IP prefers server1=master-server
I also use
pcs resource defaults resource-stickiness=INFINITY
to prevent resources moving back after failover.
This works fine for me, if the master fails (hardware failure for example).
Since its not a problem for me if the failover takes some time, i would like to implement some kind of delay in case of a short split brain.
Before doing anything, the slave should wait ~2 Minutes, before it takes over, in case the master is reachable again in this ~2 Minutes.
I was wondering, what would be the best way to do it?
Best Answer
I've never set a token timeout in Corosync to anything above 10 seconds, but you could try increasing/setting the
token
value in yourcorosync.conf
to120000
(120 seconds in milliseconds).token
should be defined in thetotem{}
section of yourcorosync.conf
;man corosync.conf
for more detail.That should prevent Corosync from declaring a node dead for 120s when the network flakes out.