Redhat – Automatic recovery with discarded data from NetworkFailure-interrupted DRBD sync

drbdhigh-availabilityredhat

Suppose I have two DRBD devices provisioned. When the second node connects, it syncs the data from the first (primary/master) node.

During this sync, the primary node loses power.

After the Primary node is lost and the original Secondary is the only available node the Secondary node is in Inconsistent/DUnknown state.

Is there any way to recover from this automatically?

version: 8.4.7 (api:1/proto:86-101)
srcversion: 0904DF2CCF7283ACE07D07A

 1: cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/DUnknown C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:390452

I can recover from this situation manually by running drbdadm promote --force <resource-name> and then (this is in a pacemaker cluster) pcs resource cleanup but I am looking for a way to automatically trigger this recovery.

Full logs of an example

[   20.233788] drbd: initialized. Version: 8.4.7 (api:1/proto:86-101)
[   20.234905] drbd: srcversion: 0904DF2CCF7283ACE07D07A
[   20.235791] drbd: registered as block device major 147
[   22.402786] drbd shareddata: Starting worker thread (from drbdsetup-84 [1406])
[   22.406433] block drbd1: disk( Diskless -> Attaching )
[   22.407422] drbd shareddata: Method to ensure write ordering: flush
[   22.408478] block drbd1: max BIO size = 4096
[   22.409211] block drbd1: drbd_bm_resize called with capacity == 2097016
[   22.410317] block drbd1: resync bitmap: bits=262127 words=4096 pages=8
[   22.411492] block drbd1: size = 1024 MB (1048508 KB)
[   22.413787] block drbd1: recounting of set bits took additional 0 jiffies
[   22.414922] block drbd1: 1024 MB (262127 bits) marked out-of-sync by on disk bit-map.
[   22.416189] block drbd1: Suspended AL updates
[   22.416942] block drbd1: disk( Attaching -> UpToDate )
[   22.418403] block drbd1: attached to UUIDs 9FB19F9A9D6573A9:0000000000000004:0000000000000000:0000000000000000
[   22.460721] drbd shareddata: conn( StandAlone -> Unconnected )
[   22.462303] drbd shareddata: Starting receiver thread (from drbd_w_sharedda [1407])
[   22.467153] drbd shareddata: receiver (re)started
[   22.468715] drbd shareddata: conn( Unconnected -> WFConnection )
[   23.000120] drbd shareddata: Handshake successful: Agreed network protocol version 101
[   23.003987] drbd shareddata: Feature flags enabled on protocol level: 0x7 TRIM THIN_RESYNC WRITE_SAME.
[   23.008195] drbd shareddata: conn( WFConnection -> WFReportParams )
[   23.010706] drbd shareddata: Starting ack_recv thread (from drbd_r_sharedda [1467])
[   23.067880] block drbd1: max BIO size = 1048576
[   23.069557] block drbd1: drbd_sync_handshake:
[   23.070869] block drbd1: self 9FB19F9A9D6573A8:0000000000000004:0000000000000000:0000000000000000 bits:262127 flags:0
[   23.073539] block drbd1: peer 3B5A831140811725:0000000000000004:0000000000000000:0000000000000000 bits:262127 flags:0
[   23.076210] block drbd1: uuid_compare()=100 by rule 90
[   23.077596] block drbd1: helper command: /sbin/drbdadm initial-split-brain minor-1
[   23.081505] block drbd1: helper command: /sbin/drbdadm initial-split-brain minor-1 exit code 0 (0x0)
[   23.084035] block drbd1: Split-Brain detected, 1 primaries, automatically solved. Sync from peer node
[   23.086539] block drbd1: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk( DUnknown -> UpToDate )
[   23.089588] block drbd1: Resumed AL updates
[   23.103227] block drbd1: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 21(1), total 21; compression: 100.0%
[   23.105986] block drbd1: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 21(1), total 21; compression: 100.0%
[   23.108662] block drbd1: conn( WFBitMapT -> WFSyncUUID )
[   23.127823] block drbd1: updated sync uuid 68A55F3E62EDE97C:0000000000000000:0000000000000000:0000000000000000
[   23.136222] block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1
[   23.140260] block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1 exit code 0 (0x0)
[   23.142823] block drbd1: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent )
[   23.145214] block drbd1: Began resync as SyncTarget (will sync 1048508 KB [262127 bits set]).
[   61.912243] drbd shareddata: PingAck did not arrive in time.
[   61.914470] drbd shareddata: peer( Primary -> Unknown ) conn( SyncTarget -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
[   61.919882] drbd shareddata: ack_receiver terminated
[   61.921491] drbd shareddata: Terminating drbd_a_sharedda
[   61.968612] drbd shareddata: Connection closed
[   61.970170] drbd shareddata: conn( NetworkFailure -> Unconnected )
[   61.971855] drbd shareddata: receiver terminated
[   61.973304] drbd shareddata: Restarting receiver thread
[   61.974743] drbd shareddata: receiver (re)started
[   61.976187] drbd shareddata: conn( Unconnected -> WFConnection )
[   62.008237] block drbd1: State change failed: Need access to UpToDate data
[   62.010446] block drbd1:   state = { cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/DUnknown r----- }
[   62.013170] block drbd1:  wanted = { cs:WFConnection ro:Primary/Unknown ds:Inconsistent/DUnknown r----- }
[   76.334863] drbd shareddata: conn( WFConnection -> Disconnecting )
[   76.336529] drbd shareddata: Discarding network configuration.
[   76.338082] drbd shareddata: Connection closed
[   76.339375] drbd shareddata: conn( Disconnecting -> StandAlone )
[   76.340898] drbd shareddata: receiver terminated
[   76.342203] drbd shareddata: Terminating drbd_r_sharedda
[   76.343712] block drbd1: disk( Inconsistent -> Failed )
[   76.364417] block drbd1: 560 MB (143363 bits) marked out-of-sync by on disk bit-map.
[   76.366742] block drbd1: disk( Failed -> Diskless )
[   76.404579] drbd shareddata: Terminating drbd_w_sharedda

Best Answer

If you don't care about the data, why replicate it in the first place? ;)

Since this is the initial sync, your Secondary node will have Inconsistent data until the sync completes. Up until that point, you'll always have to force promote the Secondary into Primary, which isn't a great thing to be doing.

Why not skip the initial sync, and then use DRBD's LVM snapshot before-resync-target handler to protect against this scenario moving forward?

To skip the initial sync, once you stand up a new device on both nodes, and they are cs:Connected and ds:Inconsistent/Inconsistent, clear the bitmap making the current state "consistent" (from one node, not both):

# drbdadm new-current-uuid --clear-bitmap all

Then, use DRBD's before-resync-target/after-resync-target handlers to take/remove snapshots of your backing LVM device before/after the resyncs so you always have a consistent dataset in case a failure does occur during a resync:

resource <resource> {
...
  handlers {
    before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh";
    after-resync-target "/usr/lib/drbd/unsnapshot-resync-target-lvm.sh";
  }
}

You'd then be able to recover the snapshot using lvconvert just like any other lvm snapshot.

Related Solutions

Mysql – Data loss due to MySQL DRBD Heartbeat failover script

This probably isn't a big help, but this has been discussed extensively of late over at the Pacemaker and Linux-HA mailing lists.

I'm not very good with heartbeat, but with pacemaker I would set up a constraint that caused the cluster resource manager to flush disks with a write lock to the disk (or down mysql temporarily) before trying to switch over, and then releasing the lock once the switch had been completed.

DRBD stacked resources: recovering from failure

Since there are no answers, here or elsewhere, I've worked around the first question by creating a copy of the DRBD runscript for the stacked resources. It's the same as the original, but with all the drbdadm commands turned into drbdadm -S for stacked. I called it drbd-stacked and set it to run after the original.

To work around the second problem, I added to the primary section of the runscript to make it read a list of resources from a file /etc/drbd.d/primary, and call drbdadm -S primary on each one.

While successful, I consider both of these to be workarounds rather than proper solutions. I'd love to see a better answer. Here's the runscript, /etc/init.d/drbd-stacked:

#!/sbin/runscript
# Copyright 1999-2007 Gentoo Foundation
# Distributed under the terms of the GNU General Public License, v2 or later
# $Header: /var/cvsroot/gentoo-x86/sys-cluster/drbd/files/drbd-8.0.rc,v 1.6 2010/08/02 04:42:36 xarthisius Exp $

opts="${opts} reload"

depend() {
        use logger
        need net drbd
        before heartbeat xendomains
        after sshd drbd
}

DEFAULTFILE="/etc/conf.d/drbd"
PRIMARYFILE="/etc/drbd.d/primary"
DRBDADM="/sbin/drbdadm"
PROC_DRBD="/proc/drbd"
MODPROBE="/sbin/modprobe"
RMMOD="/sbin/rmmod"
UDEV_TIMEOUT=10
ADD_MOD_PARAM=""

if [ -f $DEFAULTFILE ]; then
  . $DEFAULTFILE
fi

# Just in case drbdadm want to display any errors in the configuration
# file, or we need to ask the user about registering this installation
# at http://usage.drbd.org, we call drbdadm here without any IO
# redirection.
$DRBDADM sh-nop

function assure_module_is_loaded() {
        [ -e "$PROC_DRBD" ] && return
        ebegin "Loading drbd module"
        ret=0

        $MODPROBE -s drbd `$DRBDADM sh-mod-parms` $ADD_MOD_PARAM || ret=20
        eend $ret
        return $ret
}

function adjust_with_progress() {
        IFS_O=$IFS
        NEWLINE='
'
        IFS=$NEWLINE
        local D=0
        local S=0
        local N=0

        einfon "Setting drbd parameters "
        COMMANDS=`$DRBDADM -d -S adjust all` || { 
                eend 20 "Error executing drbdadm"
                return 20 
        }
        echo -n "[ "

        for CMD in $COMMANDS; do
                if echo $CMD | grep -q disk; then echo -n "d$D "; D=$(( D+1 ));
                elif echo $CMD | grep -q syncer; then echo -n "s$S "; S=$(( S+1 ));
                elif echo $CMD | grep -q net; then echo -n "n$N "; N=$(( N+1 ));
                else echo echo -n ".. ";
                fi
                IFS=$IFS_O
                $CMD || {
                        echo 
                        eend 20 "cmd $CMD failed!"
                        return 20
                }
                IFS=$NEWLINE
        done
        echo "]"
        eend 0

        IFS=$IFS_O
}

function primary_from_config_file() {
        while read line; do
                if [[ $line != \#* ]]; then
                        drbdadm -S primary $line
                fi
        done < $PRIMARYFILE
}

start() {
        einfo "Starting DRBD stacked resources:"
        eindent
        assure_module_is_loaded || return $?
        adjust_with_progress || return $?

        # make sure udev has time to create the device files
        ebegin "Waiting for udev device creation ..."
        for RESOURCE in `$DRBDADM sh-resources`; do
                for DEVICE in `$DRBDADM sh-dev $RESOURCE`; do
                        UDEV_TIMEOUT_LOCAL=$UDEV_TIMEOUT
                        while [ ! -e $DEVICE ] && [ $UDEV_TIMEOUT_LOCAL -gt 0 ] ; do
                                sleep 1
                                UDEV_TIMEOUT_LOCAL=$(( $UDEV_TIMEOUT_LOCAL-1 ))
                        done
                done
        done
        eend 0

        einfon "Waiting for connection "
        $DRBDADM -S wait-con-int
        ret=$?
        echo

        sleep 5

        einfon "Become primary if configured "
        $DRBDADM -S sh-b-pri all
        primary_from_config_file
        echo

        eend $ret
        return $ret
}

stop() {
        ebegin "Stopping all DRBD stacked resources"

        # Check for mounted drbd devices
        if ! grep -q '^/dev/drbd' /proc/mounts &>/dev/null; then
                if [ -e ${PROC_DRBD} ]; then
                        ${DRBDADM} -S down all
                        sleep 3
                #       if grep -q '^drbd' /proc/modules ; then
                #               ${RMMOD} drbd
                #       fi
                fi
                ret=$?
                eend $ret
                return $ret
        else
                einfo "drbd devices mounted, please umount them before trying to stop drbd!"
                eend 1
                return 1
        fi
}

status() {
        # NEEDS to be heartbeat friendly...
        # so: put some "OK" in the output.

        if [ -e $PROC_DRBD ]; then
                ret=0
                ebegin "drbd driver loaded OK; device status:"
                eend $ret
                cat $PROC_DRBD
        else
                ebegin "drbd not loaded"
                ret=3
                eend $ret
        fi
        return $ret
}

reload() {
        ebegin "Reloading DRBD stacked resources"
        ${DRBDADM} -S adjust all
        ret=$?
        eend $ret
        return $ret
}

And here's the config file /etc/drbd.d/primary:

# A list of DRBD resources that should be made primary on boot.
# Each line is the name of one resource. Be cafeful of the difference
# between low-level and stacked resources; this file should typically
# contain the stacked resource.
# You should include a resource if this server is running its virtual machine

my-resource-name

Best Answer

Related Solutions

Mysql – Data loss due to MySQL DRBD Heartbeat failover script

DRBD stacked resources: recovering from failure

Related Topic