Centos – Can’t start drbd service and can’t set primary node with DRBD 8.4 on CentOS 7.3

centosdrbd

Using two nodes to install DRBD.

  • node1
  • node2

Followed this guide:

http://www.learnitguide.net/2016/07/how-to-install-and-configure-drbd-on-linux.html

On both nodes

resource testdata1 {
    protocol C;
    on node1 {
            device /dev/drbd0;
            disk /dev/sdb;
            address 192.168.0.1:7788;
            meta-disk internal;
    }
    on node2 {
            device /dev/drbd0;
            disk /dev/sdb;
            address 192.168.0.2:7788;
            meta-disk internal;
    }
}

On node2

After run systemctl start drbd, the process has pending. Wait for a long time returned no result. But node1 works well at this step.

On node1

After run drbdadm primary testdata1, got this error:

0: Failure: (127) Device minor not allocated
additional info from kernel:
unknown minor
Command 'drbdsetup-84 primary 0' terminated with exit code 10

Edit

Run systemctl status drbd on node2

● drbd.service - DRBD -- please disable. Unless you are NOT using a cluster manager.
   Loaded: loaded (/usr/lib/systemd/system/drbd.service; disabled; vendor preset: disabled)
   Active: activating (start) since 木 2017-07-20 01:18:32 EDT; 1min 32s ago
 Main PID: 9137 (drbd)
   CGroup: /system.slice/drbd.service
       ├─9137 /bin/bash /lib/drbd/drbd start
       ├─9145 drbdadm wait-con-int
       └─9146 drbdsetup-84 wait-connect 0

 7月 20 01:18:32 node2 drbd[9137]: WARN: stdin/stdout is not a TTY; using /dev/con...ied
 7月 20 01:18:42 node2 drbd[9137]: WARN: stdin/stdout is not a TTY; using /dev/con......
 7月 20 01:18:42 node2 drbd[9137]: ***********************************************...***
 7月 20 01:18:42 node2 drbd[9137]: DRBD's startup script waits for the peer node(s...ar.
 7月 20 01:18:42 node2 drbd[9137]: - If this node was already a degraded cluster b...the
 7月 20 01:18:42 node2 drbd[9137]: reboot, the timeout is 0 seconds. [degr-wfc-timeout]
 7月 20 01:18:42 node2 drbd[9137]: - If the peer was available before the reboot, ...out
 7月 20 01:18:42 node2 drbd[9137]: is 0 seconds. [wfc-timeout]
 7月 20 01:18:42 node2 drbd[9137]: (These values are for resource 'testdata1'; 0 s...er)
 7月 20 01:18:42 node2 drbd[9137]: To abort waiting for DRBD connections, kill thi...145
Hint: Some lines were ellipsized, use -l to show in full.

Run journalctl -u drbd on node2

-- Logs begin at 水 2017-07-12 22:45:19 EDT, end at 木 2017-07-20 01:18:42 EDT. --
 7月 18 05:49:18 node2 systemd[1]: Starting DRBD -- please disable. Unless you are NOT using a cluster manager....
 7月 18 05:49:18 node2 drbd[23883]: Starting DRBD resources: drbd.d/testdata1.res:12: in resource testdata1, on node2:
 7月 18 05:49:18 node2 drbd[23883]: IP 198.10.0.219 not found on this host.
 7月 18 05:49:18 node2 systemd[1]: drbd.service: main process exited, code=exited, status=20/n/a
 7月 18 05:49:18 node2 systemd[1]: Failed to start DRBD -- please disable. Unless you are NOT using a cluster manager..
 7月 18 05:49:18 node2 systemd[1]: Unit drbd.service entered failed state.
 7月 18 05:49:18 node2 systemd[1]: drbd.service failed.
 7月 18 05:50:57 node2 systemd[1]: Starting DRBD -- please disable. Unless you are NOT using a cluster manager....
 7月 18 05:50:57 node2 drbd[23898]: Starting DRBD resources: [
 7月 18 05:50:57 node2 drbd[23898]: create res: testdata1
 7月 18 05:50:57 node2 drbd[23898]: prepare disk: testdata1
 7月 18 05:50:57 node2 drbd[23898]: adjust disk: testdata1:failed(apply-al:20)
 7月 18 05:50:57 node2 drbd[23898]: adjust net: testdata1
 7月 18 05:50:57 node2 drbd[23898]: ]
 7月 18 05:50:57 node2 drbd[23898]: WARN: stdin/stdout is not a TTY; using /dev/consoleopen('/dev/console, O_RDONLY): Permission denied
 7月 18 05:51:07 node2 drbd[23898]: WARN: stdin/stdout is not a TTY; using /dev/console..........
 7月 18 05:51:07 node2 drbd[23898]: ***************************************************************
 7月 18 05:51:07 node2 drbd[23898]: DRBD's startup script waits for the peer node(s) to appear.
 7月 18 05:51:07 node2 drbd[23898]: - If this node was already a degraded cluster before the
 7月 18 05:51:07 node2 drbd[23898]: reboot, the timeout is 0 seconds. [degr-wfc-timeout]
 7月 18 05:51:07 node2 drbd[23898]: - If the peer was available before the reboot, the timeout
 7月 18 05:51:07 node2 drbd[23898]: is 0 seconds. [wfc-timeout]
 7月 18 05:51:07 node2 drbd[23898]: (These values are for resource 'testdata1'; 0 sec -> wait forever)
 7月 18 05:51:07 node2 drbd[23898]: To abort waiting for DRBD connections, kill this process: kill 23916
 7月 20 01:15:48 node2 systemd[1]: Stopped DRBD -- please disable. Unless you are NOT using a cluster manager..
 7月 20 01:16:06 node2 systemd[1]: Starting DRBD -- please disable. Unless you are NOT using a cluster manager....
 7月 20 01:16:06 node2 drbd[9092]: Starting DRBD resources: [
 7月 20 01:16:06 node2 drbd[9092]: adjust disk: testdata1:failed(apply-al:20)
 7月 20 01:16:06 node2 drbd[9092]: ]
 7月 20 01:16:06 node2 drbd[9092]: WARN: stdin/stdout is not a TTY; using /dev/consoleopen('/dev/console, O_RDONLY): Permission denied
 7月 20 01:16:13 node2 drbd[9092]: WARN: stdin/stdout is not a TTY; using /dev/console.......
 7月 20 01:16:13 node2 systemd[1]: Stopped DRBD -- please disable. Unless you are NOT using a cluster manager..
 7月 20 01:18:32 node2 systemd[1]: Starting DRBD -- please disable. Unless you are NOT using a cluster manager....
 7月 20 01:18:32 node2 drbd[9137]: Starting DRBD resources: [
 7月 20 01:18:32 node2 drbd[9137]: ]
 7月 20 01:18:32 node2 drbd[9137]: WARN: stdin/stdout is not a TTY; using /dev/consoleopen('/dev/console, O_RDONLY): Permission denied
 7月 20 01:18:42 node2 drbd[9137]: WARN: stdin/stdout is not a TTY; using /dev/console..........
 7月 20 01:18:42 node2 drbd[9137]: ***************************************************************
 7月 20 01:18:42 node2 drbd[9137]: DRBD's startup script waits for the peer node(s) to appear.
 7月 20 01:18:42 node2 drbd[9137]: - If this node was already a degraded cluster before the
 7月 20 01:18:42 node2 drbd[9137]: reboot, the timeout is 0 seconds. [degr-wfc-timeout]
 7月 20 01:18:42 node2 drbd[9137]: - If the peer was available before the reboot, the timeout
 7月 20 01:18:42 node2 drbd[9137]: is 0 seconds. [wfc-timeout]
 7月 20 01:18:42 node2 drbd[9137]: (These values are for resource 'testdata1'; 0 sec -> wait forever)
 7月 20 01:18:42 node2 drbd[9137]: To abort waiting for DRBD connections, kill this process: kill 9145

Run cat /proc/drbd on node2

GIT-hash: 9976da086367a2476503ef7f6b13d4567327a280 build by akemi@Build64R7, 2016-12-04 01:08:48
 0: cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/DUnknown C r----s
    ns:0 nr:0 dw:0 dr:0 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:2097052

Best Answer

DRBD is currently not connected to its peer. This is clearly shown in the /proc/drbd under the connection state field: cs:WFConnection

Additionally, the disk state reports as Inconsistent shown in the disk state field: ds:Inconsistent/DUnknown

You will need to determine why they are not connecting, get them connected, and then you will need to complete the initial synchronization as outlined in the DRBD user's guide here: https://docs.linbit.com/doc/users-guide-84/s-initial-full-sync/