Linux-HA + dm-multipath: path removal causes segfault, kernel null pointer dereference, and STONITH

heartbeatlinuxmultipathpacemaker

well I am setting up a Linux-HA cluster running

*pacemaker-1.1.5

*openais-1.1.4

*multipath-tools-0.4.9

*OpenSuSE 11.4, kernel 2.6.37

Cluster configuration passed healthcheck by LinBit, so I'm pretty confident in it.

Multipath is being used because we have an LSI SAS array connected to each host via 2 HBAs (total 4 paths per host). What I would like to do now is to test the failover capabilities by removing paths from the multipath setup.

The multipath paths are as follows:

pgsql-data (360080e50001b658a000006874e398abe) dm-0 LSI,INF-01-00
size=6.0T features='0' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| |- 4:0:0:1 sda 8:0   active undef running
| `- 5:0:0:1 sde 8:64  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
|- 4:0:1:1 sdc 8:32  active undef running
`- 5:0:1:1 sdg 8:96  active undef running

To simulate losing a path, I echo 1 into /sys/block/{path}/device/state This causes the path to appear failed/faulty to multipath, as follows:

pgsql-data (360080e50001b658a000006874e398abe) dm-0 LSI,INF-01-00
size=6.0T features='0' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| |- 4:0:1:1 sdc 8:32  failed faulty offline
| `- 5:0:1:1 sdg 8:96  active undef  running
`-+- policy='round-robin 0' prio=0 status=enabled
|- 4:0:0:1 sda 8:0   active undef  running
`- 5:0:0:1 sde 8:64  active undef  running

However, I notice via watching /var/log/messages that the rdac checker says the path is still up:

multipathd: pgsql-data: sdc - rdac checker reports path is up

Also, let's step back to the multipath -l output –notice how the failed path is still in the active group? It should have been moved to the enabled group, and an active/running path from enabled should have taken its place (in active).

Now, if we down the other active enabled path, sdg, not only does rdac report the path as being up, but the multipath resource goes into a FAILED state in the cluster, neither of the two active / enabled paths take its place, and the result is a segfault, a kernel bug about not being able to dereference a NULL point, and the cluster STONITHs the node.

db01-primary:/home/kendall/scripts # crm resource show
db01-secondary-stonith     (stonith:external/ipmi) Started 
db01-primary-stonith       (stonith:external/ipmi) Started 
Master/Slave Set: master_drbd [drbd_pg_xlog]
 Masters: [ db01-primary ]
 Slaves: [ db01-secondary ]
Resource Group: ha-pgsql
 multipathd (lsb:/etc/init.d/multipathd) Started  FAILED
 pgsql_mp_fs        (ocf::heartbeat:Filesystem) Started 
 pg_xlog_fs (ocf::heartbeat:Filesystem) Started 
 ha-DBIP-mgmt       (ocf::heartbeat:IPaddr2) Started 
 ha-DBIP    (ocf::heartbeat:IPaddr2) Started 
 postgresql (ocf::heartbeat:pgsql) Started 
 incron     (lsb:/etc/init.d/incron) Started 
 pgbouncer  (lsb:/etc/init.d/pgbouncer) Stopped 
pager-email    (ocf::heartbeat:MailTo) Stopped 

db01-primary:/home/kendall/scripts # multipath -l
pgsql-data (360080e50001b658a000006874e398abe) dm-0 LSI,INF-01-00
size=6.0T features='0' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=enabled
| |- 4:0:1:1 sdc 8:32  failed faulty offline
| `- 5:0:1:1 sdg 8:96  failed faulty offline
`-+- policy='round-robin 0' prio=0 status=active
|- 4:0:0:1 sda 8:0   active undef  running
`- 5:0:0:1 sde 8:64  active undef  running

Here is an excerpt from /var/log/messages showing the kernel bug

Aug 17 15:30:40 db01-primary multipathd: 8:96: mark as failed
Aug 17 15:30:40 db01-primary multipathd: pgsql-data: remaining active paths: 2
Aug 17 15:30:40 db01-primary kernel: [ 1833.424180] sd 5:0:1:1: rejecting I/O to    offline device
Aug 17 15:30:40 db01-primary kernel: [ 1833.424281] device-mapper: multipath: Failing path 8:96.
Aug 17 15:30:40 db01-primary kernel: [ 1833.428389] sd 4:0:0:1: rdac: array , ctlr 1, queueing MODE_SELECT command
Aug 17 15:30:40 db01-primary multipathd: dm-0: add map (uevent)
Aug 17 15:30:41 db01-primary kernel: [ 1833.804418] sd 4:0:0:1: rdac: array , ctlr 1, MODE_SELECT completed
Aug 17 15:30:41 db01-primary kernel: [ 1833.804437] sd 5:0:0:1: rdac: array , ctlr 1, queueing MODE_SELECT command
Aug 17 15:30:41 db01-primary kernel: [ 1833.808127] sd 5:0:0:1: rdac: array , ctlr 1, MODE_SELECT completed
Aug 17 15:30:42 db01-primary multipathd: pgsql-data: sda - rdac checker reports path is up
Aug 17 15:30:42 db01-primary multipathd: 8:0: reinstated
Aug 17 15:30:42 db01-primary kernel: [ 1835.639635] device-mapper: multipath: adding disabled device 8:32
Aug 17 15:30:42 db01-primary kernel: [ 1835.639652] device-mapper: multipath: adding disabled device 8:96
Aug 17 15:30:42 db01-primary kernel: [ 1835.640666] BUG: unable to handle kernel NULL pointer dereference at           (null)
Aug 17 15:30:42 db01-primary kernel: [ 1835.640688] IP: [<ffffffffa01408a3>] dm_set_device_limits+0x23/0x140 [dm_mod]

There is also a stack trace, which is available at http://pastebin.com/gifMj7gu

multipath.conf is available at http://pastebin.com/dw9pqF3Z

Anyone has any insight into this, and/or how to proceed?

I can re-create this each time.

Best Answer

Ok, so it turns out that just setting "offline" in /sys/block/{dev}/device/state was not sufficient to make rdac report the path as being down. Last night I spent some time with the unit, pulling the SAS cables and watching the behavior of the system. This works as properly. Not quite "as expected" because when an active path goes not, it does not get replaced from the enabled group, but that's a different issue. Failover also worked as expected; once the last path was lost the cluster shut the database and related resources down, and transferred them to the secondary node.

If you find yourself in a similar situation, you can trying setting the multipath hwhandler to "0" in multipath.conf; you'll have to set this in the device{} section. This basically disables path checks, so once the device is offline'd, it's really offline.