Linux – Dual Primary OCFS2 DRBD encountered split-brain. Is recovery always going to be manual in this case

drbdlinuxocfs2

I've got two webservers which each have a disk attached. This disk is synced between them using drbd (2:8.3.13-1.1ubuntu1) in 'dual-primary' mode, and over the top of this I run ocfs2 (1.6.4-1ubuntu1) as a cluster filesystem. The nodes communicate on a private network 192.168.3.0/24. For the most part, this is stable, and works well.

Last night, there appeared to have been a network outage. This resulted in a split-brain scenario where node01 was left in Standalone and Primary, while node02 was left in WFConnection and primary. Recovery was a manual process this morning of diffing the two filesystems, deciding that node01 should be authoritative, putting node02 into secondary and then issuing drbdadm connect commands on each node. Remounting the filesystem after this and we're back up and running.

My question is: Is this type of outage always going to require a manual resolution? Or are there ways in which this process can be automated? My understanding was that drbd should try to be intelligent in the event of a split brain about working out which node should become primary and secondary. It seems that in this case, a simple network outage left both in primary, which my config just says 'disconnect'. Looking at the logs, what I find interesting is the fact that they both seemed to agree that node02 should be the SyncSource, and yet when looking at the rsync log, it's actually node01 that has the most recent changes. Also interesting is the line on node01 stating 'I shall become SyncTarget, but I am primary!'. To me, it looks like drbd tried to resolve this, but failed for some reason.

Is there a better way of doing this?

The config for r0 is this:

resource r0 {
    meta-disk internal;
    device /dev/drbd0;
    disk /dev/xvda2;

    syncer { rate 1000M; }
    net {
        #We're running ocfs2, so two primaries desirable.
        allow-two-primaries;

        after-sb-0pri discard-zero-changes;
        after-sb-1pri discard-secondary;
        after-sb-2pri disconnect;

    }
    handlers{
        before-resync-target "/sbin/drbdsetup $DRBD_MINOR secondary";

        split-brain "/usr/lib/drbd/notify-split-brain.sh root";
    }
    startup { become-primary-on both; }

    on node02 { address 192.168.3.8:7789; }
    on node01 { address 192.168.3.1:7789; }
}

I've also put the kern.log files on pastebin:

Node01: http://pastebin.com/gi1HPtut

Node02: http://pastebin.com/4XSCDQdC

Best Answer

IMHO you already choose the best SB-policy for DRBD. So in your case there had to be changes on the same part of the filesystem (i.e. DRBD-block) on BOTH sides.

So in that case - yes - you have to resolve that manually.

The question that arises to me is why did these concurrent accesses happen?

You should investigate into that direction. If network is down there should be no access at one side, so "discard zero changes" should help - but it did not.

Apart from that your should prevent split brains by having two or more INDEPENDENT network connections. I always use three of them on my clusters.

Related Solutions

Centos – Can not switch drbd to secondary

I'm not sure if this OCFS2 heartbeat region is preventing DRBD from switching to secondary:

Maybe. Have you tried to kill that region follow this guide?

# /etc/init.d/o2cb offline serving
Stopping O2CB cluster serving: Failed
Unable to stop cluster as heartbeat region still active

OK, firstly you should list the OCFS2 volumes along with their labels and uuids:

# mounted.ocfs2 -d
Device                FS     Stack  UUID                              Label
/dev/sdb1             ocfs2  o2cb   C3E41CA2BDE8477CA7FF2C796098633C  data_ocfs2
/dev/drbd1            ocfs2  o2cb   C3E41CA2BDE8477CA7FF2C796098633C  data_ocfs2

Secondly, check to see if you have any reference to this device:

# ocfs2_hb_ctl -I -d /dev/sdb1
C3E41CA2BDE8477CA7FF2C796098633C: 1 refs

Try to kill it:

# ocfs2_hb_ctl -K -d /dev/sdb1 ocfs2

then stop the cluster stack:

# /etc/init.d/o2cb stop
Stopping O2CB cluster serving: OK
Unmounting ocfs2_dlmfs filesystem: OK
Unloading module "ocfs2_dlmfs": OK
Unmounting configfs filesystem: OK
Unloading module "configfs": OK

and bring the device back into secondary role:

# drbdadm secondary r0
# drbd-overview 
  1:r0  StandAlone Secondary/Unknown UpToDate/DUnknown r-----

Now you can recover split brain as usual:

# drbdadm -- --discard-my-data connect r0
# drbd-overview 
  1:r0  WFConnection Secondary/Unknown UpToDate/DUnknown C r-----

On the other node (the split brain survivor):

# drbdadm connect r0
# drbd-overview                                                                                                
  1:r0  SyncSource Primary/Secondary UpToDate/Inconsistent C r---- /data ocfs2 100G 1.9G 99G 2% 
        [>....................] sync'ed:  3.2% (753892/775004)K delay_probe: 28

On the split brain victim:

# /etc/init.d/o2cb start
Loading filesystem "configfs": OK
Mounting configfs filesystem at /sys/kernel/config: OK
Loading filesystem "ocfs2_dlmfs": OK
Mounting ocfs2_dlmfs filesystem at /dlm: OK
Starting O2CB cluster serving: OK

# /etc/init.d/ocfs2 start
Starting Oracle Cluster File System (OCFS2)                [  OK  ]

Verify that this mount point is up and running:

# df -h /data/
Filesystem            Size  Used Avail Use% Mounted on
/dev/drbd1            100G  1.9G   99G   2% /data

Proxmox drbd configuration split brain

I did not have an experience with Poxmox but configured normal pacemaker/corosync cluster on CentOS, so hope my observations are still useful and applicable here.

I am very suspicious about Primary/Primary DRBD setup. Even with Primary/Secondary configuration split brain is probable if something goes wrong. I was wondered how easily DRBD can fall into split-brain condition in not well-tuned cluster.

With Primary/Primary case special attention should be done to fencing facilities in order to reduce data loss probability. Excellent introduction to two-node DRBD cluster is here.

Primary/Primary setup is needed mainly for live migration. If you do not use live migration Primary/Secondary is enough, and much more preferable.

Concerning your question, dedicated DRBD resource is also working solution. You will probably move storage stack from DRBD/LVM to LVM/DRBD. ~~So, cluttered LVM becomes to be needed even in Primary/Secondary setup~~ . UPD:Clustered LVM is not needed here as well as dlm to provide It.

Main disadvantage I see: a lot of manual careful work to prepare a VM storage.

Another point to think about in advance - backup strategy. With many DRBD resources it may be a bit more complicated.

I started my first cluster setup with LVM/DRBD stack and dedicated DRBD resource for VM, but later switched to more common DRBD/LVM since new VM provisioning is much simpler in this case.

Best Answer

Related Solutions

Centos – Can not switch drbd to secondary

Proxmox drbd configuration split brain

Related Topic