DRBD stacked resources: recovering from failure

drbd

We're running a stacked four-node DRBD setup like this:

A  -->  B
|       |
v       v
C       D

This means three DRBD resources running across these four servers. Servers A and B are Xen hosts running VMs, while servers C and D are for backups. A is in the same datacentre as C.

  1. From server A to server C, in the first datacentre, using protocol B
  2. From server B to server D, in the second datacentre, using protocol B
  3. From server A to server B, different datacentres, stacked resource using protocol A

First question: booting a stacked resource

We haven't got any vital data running on this setup yet – we're still making sure it works first. This means simulating power cuts, network outages etc and seeing what steps we need to recover.

When we pull the power out of server A, both resources go down; it attempts to bring them back up at next boot. However, it only succeeds at bringing up the lower-level resource, A->C. The stacked resource A->B doesn't even try to connect, presumably because it can't find the device until it's a connected primary on the lower level.

So if anything goes wrong we need to manually log in and bring that resource up, then start the virtual machine on top of it.

Second question: setting the primary of a stacked resource

Our lower-level resources are configured so that the right one is considered primary:

resource test-AC {
    on A { ... }

    on C { ... }

    startup {
        become-primary-on   A;
    }
}

But I don't see any way to do the same with a stacked resource, as the following isn't a valid config:

resource test-AB {
    stacked-on-top-of test-AC { ... }

    stacked-on-top-of test-BD { ... }

    startup {
        become-primary-on   test-AC;
    }
}

This too means that recovering from a failure requires manual intervention. Is there no way to set the automatic primary for a stacked resource?

Best Answer

Since there are no answers, here or elsewhere, I've worked around the first question by creating a copy of the DRBD runscript for the stacked resources. It's the same as the original, but with all the drbdadm commands turned into drbdadm -S for stacked. I called it drbd-stacked and set it to run after the original.

To work around the second problem, I added to the primary section of the runscript to make it read a list of resources from a file /etc/drbd.d/primary, and call drbdadm -S primary on each one.

While successful, I consider both of these to be workarounds rather than proper solutions. I'd love to see a better answer. Here's the runscript, /etc/init.d/drbd-stacked:

#!/sbin/runscript
# Copyright 1999-2007 Gentoo Foundation
# Distributed under the terms of the GNU General Public License, v2 or later
# $Header: /var/cvsroot/gentoo-x86/sys-cluster/drbd/files/drbd-8.0.rc,v 1.6 2010/08/02 04:42:36 xarthisius Exp $

opts="${opts} reload"

depend() {
        use logger
        need net drbd
        before heartbeat xendomains
        after sshd drbd
}

DEFAULTFILE="/etc/conf.d/drbd"
PRIMARYFILE="/etc/drbd.d/primary"
DRBDADM="/sbin/drbdadm"
PROC_DRBD="/proc/drbd"
MODPROBE="/sbin/modprobe"
RMMOD="/sbin/rmmod"
UDEV_TIMEOUT=10
ADD_MOD_PARAM=""

if [ -f $DEFAULTFILE ]; then
  . $DEFAULTFILE
fi

# Just in case drbdadm want to display any errors in the configuration
# file, or we need to ask the user about registering this installation
# at http://usage.drbd.org, we call drbdadm here without any IO
# redirection.
$DRBDADM sh-nop

function assure_module_is_loaded() {
        [ -e "$PROC_DRBD" ] && return
        ebegin "Loading drbd module"
        ret=0

        $MODPROBE -s drbd `$DRBDADM sh-mod-parms` $ADD_MOD_PARAM || ret=20
        eend $ret
        return $ret
}

function adjust_with_progress() {
        IFS_O=$IFS
        NEWLINE='
'
        IFS=$NEWLINE
        local D=0
        local S=0
        local N=0

        einfon "Setting drbd parameters "
        COMMANDS=`$DRBDADM -d -S adjust all` || { 
                eend 20 "Error executing drbdadm"
                return 20 
        }
        echo -n "[ "

        for CMD in $COMMANDS; do
                if echo $CMD | grep -q disk; then echo -n "d$D "; D=$(( D+1 ));
                elif echo $CMD | grep -q syncer; then echo -n "s$S "; S=$(( S+1 ));
                elif echo $CMD | grep -q net; then echo -n "n$N "; N=$(( N+1 ));
                else echo echo -n ".. ";
                fi
                IFS=$IFS_O
                $CMD || {
                        echo 
                        eend 20 "cmd $CMD failed!"
                        return 20
                }
                IFS=$NEWLINE
        done
        echo "]"
        eend 0

        IFS=$IFS_O
}

function primary_from_config_file() {
        while read line; do
                if [[ $line != \#* ]]; then
                        drbdadm -S primary $line
                fi
        done < $PRIMARYFILE
}

start() {
        einfo "Starting DRBD stacked resources:"
        eindent
        assure_module_is_loaded || return $?
        adjust_with_progress || return $?

        # make sure udev has time to create the device files
        ebegin "Waiting for udev device creation ..."
        for RESOURCE in `$DRBDADM sh-resources`; do
                for DEVICE in `$DRBDADM sh-dev $RESOURCE`; do
                        UDEV_TIMEOUT_LOCAL=$UDEV_TIMEOUT
                        while [ ! -e $DEVICE ] && [ $UDEV_TIMEOUT_LOCAL -gt 0 ] ; do
                                sleep 1
                                UDEV_TIMEOUT_LOCAL=$(( $UDEV_TIMEOUT_LOCAL-1 ))
                        done
                done
        done
        eend 0

        einfon "Waiting for connection "
        $DRBDADM -S wait-con-int
        ret=$?
        echo

        sleep 5

        einfon "Become primary if configured "
        $DRBDADM -S sh-b-pri all
        primary_from_config_file
        echo

        eend $ret
        return $ret
}

stop() {
        ebegin "Stopping all DRBD stacked resources"

        # Check for mounted drbd devices
        if ! grep -q '^/dev/drbd' /proc/mounts &>/dev/null; then
                if [ -e ${PROC_DRBD} ]; then
                        ${DRBDADM} -S down all
                        sleep 3
                #       if grep -q '^drbd' /proc/modules ; then
                #               ${RMMOD} drbd
                #       fi
                fi
                ret=$?
                eend $ret
                return $ret
        else
                einfo "drbd devices mounted, please umount them before trying to stop drbd!"
                eend 1
                return 1
        fi
}

status() {
        # NEEDS to be heartbeat friendly...
        # so: put some "OK" in the output.

        if [ -e $PROC_DRBD ]; then
                ret=0
                ebegin "drbd driver loaded OK; device status:"
                eend $ret
                cat $PROC_DRBD
        else
                ebegin "drbd not loaded"
                ret=3
                eend $ret
        fi
        return $ret
}

reload() {
        ebegin "Reloading DRBD stacked resources"
        ${DRBDADM} -S adjust all
        ret=$?
        eend $ret
        return $ret
}

And here's the config file /etc/drbd.d/primary:

# A list of DRBD resources that should be made primary on boot.
# Each line is the name of one resource. Be cafeful of the difference
# between low-level and stacked resources; this file should typically
# contain the stacked resource.
# You should include a resource if this server is running its virtual machine

my-resource-name