Linux – Pacemaker not detecting node disconnected

centos7clusterlinuxpacemaker

I have set up three Centos 7 KVM VMs on a Centos 7 host for the purpose of testing various aspects of the clustering configuration before we deploy a production system.

The nodes are called clua, club and cluc. There are few resources configured:

  • A fence_virsh STONITH resource clone set
  • dlm, clvmd and GFS2 FileSystem resource clone sets

I have been testing various failure scenarios. The one that has been causing problems is where I cause the nodes to lose contact with each other by ifdown'ing interfaces on two of the three nodes.

In the test here I have ifdowned the interfaces on clua and cluc, leaving club alone. I have confirmed that I cannot ping between the nodes in this state.

On club, it does more or less what I expect:

root@itkclub ~ # pcs status
Cluster name: tclu
Stack: corosync
Current DC: club (version 1.1.15-11.el7_3.4-e174ec8) - partition WITHOUT quorum
Last updated: Thu Apr  6 16:23:28 2017          Last change: Thu Apr  6 16:18:33 2017 by root via cibadmin on clua

3 nodes and 12 resources configured

Node clua: UNCLEAN (offline)
Node cluc: UNCLEAN (offline)
Online: [ club ]

Full list of resources:

 Clone Set: dlm-clone [dlm]
     dlm        (ocf::pacemaker:controld):      Started clua (UNCLEAN)
     dlm        (ocf::pacemaker:controld):      Started cluc (UNCLEAN)
     Started: [ club ]
 Clone Set: clvmd-clone [clvmd]
     clvmd      (ocf::heartbeat:clvm):  Started clua (UNCLEAN)
     clvmd      (ocf::heartbeat:clvm):  Started cluc (UNCLEAN)
     Started: [ club ]
 Clone Set: varopt_fs-clone [varopt_fs]
     varopt_fs  (ocf::heartbeat:Filesystem):    Started clua (UNCLEAN)
     varopt_fs  (ocf::heartbeat:Filesystem):    Started cluc (UNCLEAN)
     Started: [ club ]
 Clone Set: virsh-fencing-clone [virsh-fencing]
     virsh-fencing      (stonith:fence_virsh):  Started clua (UNCLEAN)
     virsh-fencing      (stonith:fence_virsh):  Started cluc (UNCLEAN)
     Started: [ club ]

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

But, on both of the other nodes (the ones where I ifdowned the interfaces) it does not seem to detect that anything is wrong:

root@itkcluc ~ # pcs status
Cluster name: tclu
Stack: corosync
Current DC: club (version 1.1.15-11.el7_3.4-e174ec8) - partition with quorum
Last updated: Thu Apr  6 16:26:01 2017          Last change: Thu Apr  6 16:18:33 2017 by root via cibadmin on clua

3 nodes and 12 resources configured

Online: [ clua club cluc ]

Full list of resources:

 Clone Set: dlm-clone [dlm]
     Started: [ clua club cluc ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ clua club cluc ]
 Clone Set: varopt_fs-clone [varopt_fs]
     Started: [ clua club cluc ]
 Clone Set: virsh-fencing-clone [virsh-fencing]
     Started: [ clua club cluc ]

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

root@itkcluc ~ # ping club
PING club (192.168.1.12) 56(84) bytes of data.
^C
--- club ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 2999ms
root@itkcluc ~ # ping clua
PING clua (192.168.1.2) 56(84) bytes of data.
^C
--- clua ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 2999ms

Why has pacemaker on clua and cluc not detected that it cannot talk to any of the other nodes?

Once it gets into a state like this, what is the correct recovery procedure?

Best Answer

None of the remaining nodes have quorum, therefore, no STONITH actions can be taken, and therefore no cluster actions are considered "safe".

What did you set the clusters no-quorum-policy property to? Is it freeze perchance? You cannot use stop, which is the default, because a node without quorum won't be able to stop their resources since GFS2 requires quorum to unmount, or otherwise, access its data.

Also, club is the DC (Designated Controller) in your example; it's keeping track of the cluster's resources. The other nodes would have to achieve quorum before they could elect a new DC.

In a three node cluster, the chances of two nodes' NICs failing at the exact same time is highly unlikely. However, if you are still concerned for whatever reason, you can add more nodes to the cluster to act solely as quorum nodes (use -inf: location constraints to keep resources off of them) until that risk becomes satisfactorily small.

To get out of this I would simply reset all three boxes "manually": echo b > /proc/sysrq-trigger

Related Topic