MySQL DRBD – Fixing Cluster Nodes Not Configured (StandAlone)

drbdMySQL

I have a HA cluster with two nodes, node one is the primary and node 2 is its mirror. I have a problem in the mysql resource since my nodes are not synchronized

drbd-overview

Node Principal:
0:home Connected Primary/Secondary UpToDate/UpToDate C r—–
1:storage Connected Secondary/Primary UpToDate/UpToDate C r—–
2:mysql StandAlone Secondary/Unknown UpToDate/Outdated r—–

Node Secundary:
0:home Connected Secondary/Primary UpToDate/UpToDate C r—–
1:storage Connected Primary/Secondary UpToDate/UpToDate C r—–
2:mysql StandAlone Primary/Unknown UpToDate/Outdated r—–

Reviewing the messages file I found the following

Apr-19 18:20:36 clsstd2 kernel: block drbd2:self C1480E287A8CAFAB:C7B94724E2658B94:5CAE57DEB3EDC4EE:F5887A918B55FB1A bits:114390101 flags:0
Apr-19 18:20:36 clsstd2 kernel: block drbd2:peer 719D326BDE8272E2:0000000000000000:C7BA4724E2658B94:C7B94724E2658B95 bits:0 flags:1 
                                                         
Apr-19 18:20:36 clsstd2 kernel: block drbd2:uuid_compare()=-1000 by rule 100                           
Apr-19 18:20:37 clsstd2 kernel: block drbd2:Unrelated data, aborting!
Apr-19 18:20:37 clsstd2 kernel: block drbd2:conn (WFReportParams -> Disconnecting)
Apr-19 18:20:37 clsstd2 kernel: block drbd2:error receiving ReportState, l: 4!
Apr-19 18:20:38 clsstd2 kernel: block drbd2:asender terminated
Apr-19 18:20:38 clsstd2 kernel: block drbd2:Terminating asender thread
Apr-19 18:20:38 clsstd2 kernel: block drbd2:Connection closed
Apr-19 18:20:38 clsstd2 kernel: block drbd2:conn (Disconnecting -> StandAlone)
Apr-19 18:20:39 clsstd2 kernel: block drbd2:reciver terminated
Apr-19 18:20:39 clsstd2 kernel: block drbd2:Terminating reciver thread
Apr-19 18:20:39 clsstd2 auditd[3960]: Audit daemon rotating log files

I don't understand what the problem is and how I can solve it, since checking both nodes I realized that in the var/lib/mysql directory I don't have the ibdata1 file in node 2 but it does exist in node1

Best Answer

The problem is you caught "famous" DRBD split brain condition and both DRBD nodes went to “StandAlone” state. It’s difficult to say do your have valid or corrupted DB on your primary node, but for now you have two routes to choose from:

  1. Try to re-sync the DRBD nodes assigning one of them as having more recent version of the data, which isn't necessary your case.

This is what you run on the second node:

#drbdadm secondary resource 
#drbdadm disconnect resource
#drbdadm -- --discard-my-data connect resource

This is what you run on your alive node, one you think having the most recent version of the data:

#drbdadm connect resource

If it won’t help you can trash second node and imitate automatic rebuild executing this command:

#drbdadm invalidate resource
  1. Purge both nodes data with the last command from (1) and recover your DB from backups.

Hope this helped!

P.S. I would really recommend avoiding DRBD in production. What your see is a quite common thing, unfortunately.