Drbd corosync cluster second node is trying to be primary all the times

corosync

We are facing the problem with drbd corosync cluster.

On one node who is primary all resources(mysql services,drbd) are working fine.
But the second node is trying to be primary all the times.


The error logs of second node as shown below:

lrmd: [25272]: info: RA output: (mysql-drbd:0:promote:stderr) 0: State change failed: (-1) Multiple primaries not allowed by config 

Oct  1 16:39:39 node2 lrmd: [25272]: info: RA output: (mysql-drbd:0:promote:stderr) 0: State change failed: (-1) Multiple primaries not allowed by config 

Oct  1 16:39:39 node2 lrmd: [25272]: info: RA output: (mysql-drbd:0:promote:stderr) Command 'drbdsetup 0 primary' terminated with exit code 11 

Oct  1 16:39:39 node2 drbd[25416]: ERROR: mysql-disk: Called drbdadm -c /etc/drbd.conf primary mysql-disk

Oct  1 16:39:39 node2 drbd[25416]: ERROR: mysql-disk: Called drbdadm -c /etc/drbd.conf primary mysql-disk

Oct  1 16:39:39 node2 drbd[25416]: ERROR: mysql-disk: Exit code 11
Oct  1 16:39:39 node2 drbd[25416]: ERROR: mysql-disk: Exit code 11
Oct  1 16:39:39 node2 drbd[25416]: ERROR: mysql-disk: Command output: 
Oct  1 16:39:39 node2 drbd[25416]: ERROR: mysql-disk: Command output: 

corosync the Master/slave status is not perfect. See the corosync status below.

Node1

[root@node1 ~]# crm status
============
Last updated: Thu Oct  2 09:01:30 2014
Stack: openais

Current DC: node1 - partition WITHOUT quorum

Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3

2 Nodes configured, 2 expected votes 4 Resources configured.

============

Online: [ node1 ]

OFFLINE: [ node2 ]

 mysql-vip      (ocf::heartbeat:IPaddr2):       Started node1

 Master/Slave Set: mysql-drbd-ms

     Masters: [ node1 ]

     Stopped: [ mysql-drbd:1 ]

 mysql-fs       (ocf::heartbeat:Filesystem):    Started node1

 mysql-server   (ocf::heartbeat:mysql): Started node1

You have new mail in /var/spool/mail/root

Node2

[root@node2 ~]# crm status
============
Last updated: Thu Oct  2 09:03:04 2014
Stack: openais

Current DC: node2 - partition WITHOUT quorum

Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
2 Nodes configured, 2 expected votes 4 Resources configured.
============

Online: [ node2 ]

OFFLINE: [ node1 ]


 Master/Slave Set: mysql-drbd-ms

     mysql-drbd:0       (ocf::linbit:drbd):     Slave node2 (unmanaged) FAILED

     Stopped: [ mysql-drbd:1 ]


Failed actions:
    mysql-drbd:0_promote_0 (node=node2, call=7, rc=-2, status=Timed Out): unknown exec error
    mysql-drbd:0_stop_0 (node=node2, call=13, rc=6, status=complete): not configured

DRBD status is showing fine on both nodes

Node1 (primary):

[root@node1 ~]# service drbd status

drbd driver loaded OK; device status:

version: 8.3.8 (api:88/proto:86-94)

GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by 

mockbuild@builder10.centos.org, 2010-06-04 08:04:09

m:res         cs         ro                 ds                 p  mounted  fstype

0:mysql-disk  Connected  Primary/Secondary  UpToDate/UpToDate  C

Node2 (Secondary):

[root@node2 ~]# service drbd status

drbd driver loaded OK; device status:

version: 8.3.8 (api:88/proto:86-94)

GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by 

mockbuild@builder10.centos.org, 2010-06-04 08:04:09

m:res         cs         ro                 ds                 p  mounted  fstype

0:mysql-disk  Connected  Secondary/Primary  UpToDate/UpToDate  C

Best Answer

This happen, because, you don't have the cluster fencing configured(stonith), now your cluster is in split-brain

 Now you have a cluster with two DC and every node are trying to start the resource