Linux – Dual Primary OCFS2 DRBD encountered split-brain. Is recovery always going to be manual in this case

drbdlinuxocfs2

I've got two webservers which each have a disk attached. This disk is synced between them using drbd (2:8.3.13-1.1ubuntu1) in 'dual-primary' mode, and over the top of this I run ocfs2 (1.6.4-1ubuntu1) as a cluster filesystem. The nodes communicate on a private network 192.168.3.0/24. For the most part, this is stable, and works well.

Last night, there appeared to have been a network outage. This resulted in a split-brain scenario where node01 was left in Standalone and Primary, while node02 was left in WFConnection and primary. Recovery was a manual process this morning of diffing the two filesystems, deciding that node01 should be authoritative, putting node02 into secondary and then issuing drbdadm connect commands on each node. Remounting the filesystem after this and we're back up and running.

My question is: Is this type of outage always going to require a manual resolution? Or are there ways in which this process can be automated? My understanding was that drbd should try to be intelligent in the event of a split brain about working out which node should become primary and secondary. It seems that in this case, a simple network outage left both in primary, which my config just says 'disconnect'. Looking at the logs, what I find interesting is the fact that they both seemed to agree that node02 should be the SyncSource, and yet when looking at the rsync log, it's actually node01 that has the most recent changes. Also interesting is the line on node01 stating 'I shall become SyncTarget, but I am primary!'. To me, it looks like drbd tried to resolve this, but failed for some reason.

Is there a better way of doing this?

The config for r0 is this:

resource r0 {
    meta-disk internal;
    device /dev/drbd0;
    disk /dev/xvda2;

    syncer { rate 1000M; }
    net {
        #We're running ocfs2, so two primaries desirable.
        allow-two-primaries;

        after-sb-0pri discard-zero-changes;
        after-sb-1pri discard-secondary;
        after-sb-2pri disconnect;

    }
    handlers{
        before-resync-target "/sbin/drbdsetup $DRBD_MINOR secondary";

        split-brain "/usr/lib/drbd/notify-split-brain.sh root";
    }
    startup { become-primary-on both; }

    on node02 { address 192.168.3.8:7789; }
    on node01 { address 192.168.3.1:7789; }
}

I've also put the kern.log files on pastebin:

Node01: http://pastebin.com/gi1HPtut

Node02: http://pastebin.com/4XSCDQdC

Best Answer

IMHO you already choose the best SB-policy for DRBD. So in your case there had to be changes on the same part of the filesystem (i.e. DRBD-block) on BOTH sides.

So in that case - yes - you have to resolve that manually.

The question that arises to me is why did these concurrent accesses happen?

You should investigate into that direction. If network is down there should be no access at one side, so "discard zero changes" should help - but it did not.

Apart from that your should prevent split brains by having two or more INDEPENDENT network connections. I always use three of them on my clusters.