Linux – DRBD terrible sync performance on 10GigE

debiandrbdlinuxperformance

I've set up a pair of identical servers with RAID arrays (8 cores, 16GB RAM, 12×2 TB RAID6), 3 10GigE interfaces, to host some highly available services.

The systems are currently running Debian 7.9 Wheezy oldstable (because corosync/pacemaker are not available on 8.x stable nor testing).

  • Local disk performance is about 900 MB/s write, 1600 MB/s read.
  • network throughput between the machines is over 700MB/s.
  • through iSCSI, each machine can write to the other's storage at more than 700 MB/s.

However, no matter the way I configure DRBD, the throughput is limited to 100MB/s. It really looks like some hardcoded limit. I can reliably lower performance by tweaking the settings, but it never goes over 1Gbit (122MB/s are reached for a couple of seconds at a time). I'm really pulling my hair on this one.

  • plain vanilla kernel 3.18.24 amd64
  • drbd 8.9.2~rc1-1~bpo70+1

The configuration is split in two files: global-common.conf:

global {
        usage-count no;
}

common {
        handlers {
        }

        startup {
        }

        disk {
                on-io-error             detach;
         #       no-disk-flushes ;
        }
        net {
                max-epoch-size          8192;
                max-buffers             8192;
                sndbuf-size             2097152;
        }
        syncer {
                rate                    4194304k;
                al-extents              6433;
        }
}

and cluster.res:

resource rd0 {
        protocol C;
        on cl1 {
                device /dev/drbd0;
                disk /dev/sda4;
                address 192.168.42.1:7788;
                meta-disk internal;
        }

        on cl2 {
                device /dev/drbd0;
                disk /dev/sda4;
                address 192.168.42.2:7788;
                meta-disk internal;
        }
}

Output from cat /proc/drbdon slave :

version: 8.4.5 (api:1/proto:86-101)
srcversion: EDE19BAA3D4D4A0BEFD8CDE 
 0: cs:SyncTarget ro:Secondary/Secondary ds:Inconsistent/UpToDate C r-----
    ns:0 nr:4462592 dw:4462592 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:16489499884
        [>....................] sync'ed:  0.1% (16103024/16107384)M
        finish: 49:20:03 speed: 92,828 (92,968) want: 102,400 K/sec

Output from vmstat 2 on master (both machines are almost completely idle):

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 14952768 108712 446108    0    0   213   254   16    9  0  0 100  0
 0  0      0 14952484 108712 446136    0    0     0     4 10063 1361  0  0 99  0
 0  0      0 14952608 108712 446136    0    0     0     4 10057 1356  0  0 99  0
 0  0      0 14952608 108720 446128    0    0     0    10 10063 1352  0  1 99  0
 0  0      0 14951616 108720 446136    0    0     0     6 10175 1417  0  1 99  0
 0  0      0 14951748 108720 446136    0    0     0     4 10172 1426  0  1 99  0

Output from iperf between the two servers:

------------------------------------------------------------
Client connecting to cl2, TCP port 5001
TCP window size:  325 KByte (default)
------------------------------------------------------------
[  3] local 192.168.42.1 port 47900 connected with 192.168.42.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  6.87 GBytes  5.90 Gbits/sec

Apparently initial synchronisation is supposed to be somewhat slow, but not this slow… Furthermore it doesn't really react to any attempt to throttle sync rate like drbdadm disk-options --resync-rate=800M all.

Best Answer

In newer versions of DRBD (8.3.9 and newer) there is a dynamic resync controller that needs tuning. In older versions of DRBD setting the syncer {rate;} was enough; now it's used more as a lightly suggested starting place for the dynamic resync speed.

The dynamic sync controller is tuned with the "c-settings" in the disk section of DRBD's configuration (see $ man drbd.conf for details on each of these settings).

With 10Gbe between these nodes, and assuming low latency since protocol C is used, the following config should get things moving quicker:

resource rd0 {
        protocol C;
        disk {
                c-fill-target 10M;
                c-max-rate   700M;
                c-plan-ahead    7;
                c-min-rate     4M;
        }
        on cl1 {
                device /dev/drbd0;
                disk /dev/sda4;
                address 192.168.42.1:7788;
                meta-disk internal;
        }

        on cl2 {
                device /dev/drbd0;
                disk /dev/sda4;
                address 192.168.42.2:7788;
                meta-disk internal;
        }
}

If you're still not happy, try turning max-buffers up to 12k. If you're still not happy, you can try turning up c-fill-target in 2M increments.