I have a HA cluster with two nodes, node one is the primary and node 2 is its mirror. I have a problem in the mysql resource since my nodes are not synchronized
drbd-overview
Node Principal:
0:home Connected Primary/Secondary UpToDate/UpToDate C r—–
1:storage Connected Secondary/Primary UpToDate/UpToDate C r—–
2:mysql StandAlone Secondary/Unknown UpToDate/Outdated r—–
Node Secundary:
0:home Connected Secondary/Primary UpToDate/UpToDate C r—–
1:storage Connected Primary/Secondary UpToDate/UpToDate C r—–
2:mysql StandAlone Primary/Unknown UpToDate/Outdated r—–
Reviewing the messages file I found the following
Apr-19 18:20:36 clsstd2 kernel: block drbd2:self C1480E287A8CAFAB:C7B94724E2658B94:5CAE57DEB3EDC4EE:F5887A918B55FB1A bits:114390101 flags:0
Apr-19 18:20:36 clsstd2 kernel: block drbd2:peer 719D326BDE8272E2:0000000000000000:C7BA4724E2658B94:C7B94724E2658B95 bits:0 flags:1
Apr-19 18:20:36 clsstd2 kernel: block drbd2:uuid_compare()=-1000 by rule 100
Apr-19 18:20:37 clsstd2 kernel: block drbd2:Unrelated data, aborting!
Apr-19 18:20:37 clsstd2 kernel: block drbd2:conn (WFReportParams -> Disconnecting)
Apr-19 18:20:37 clsstd2 kernel: block drbd2:error receiving ReportState, l: 4!
Apr-19 18:20:38 clsstd2 kernel: block drbd2:asender terminated
Apr-19 18:20:38 clsstd2 kernel: block drbd2:Terminating asender thread
Apr-19 18:20:38 clsstd2 kernel: block drbd2:Connection closed
Apr-19 18:20:38 clsstd2 kernel: block drbd2:conn (Disconnecting -> StandAlone)
Apr-19 18:20:39 clsstd2 kernel: block drbd2:reciver terminated
Apr-19 18:20:39 clsstd2 kernel: block drbd2:Terminating reciver thread
Apr-19 18:20:39 clsstd2 auditd[3960]: Audit daemon rotating log files
I don't understand what the problem is and how I can solve it, since checking both nodes I realized that in the var/lib/mysql directory I don't have the ibdata1 file in node 2 but it does exist in node1
Best Answer
The problem is you caught "famous" DRBD split brain condition and both DRBD nodes went to “StandAlone” state. It’s difficult to say do your have valid or corrupted DB on your primary node, but for now you have two routes to choose from:
This is what you run on the second node:
This is what you run on your alive node, one you think having the most recent version of the data:
If it won’t help you can trash second node and imitate automatic rebuild executing this command:
Hope this helped!
P.S. I would really recommend avoiding DRBD in production. What your see is a quite common thing, unfortunately.