Linux – lpfc + multipath + ubuntu – path keeps switching

emulexlinuxmultipathstorage-area-networkUbuntu

I am having issues configuring multipath using Emulex (lpfc). Although I do not detect data corruption the SAN administrator has a tool that shows that the paths are being switched every 20 seconds or so. Here are the details:

# multipath -l
san01 (3600a0b80002a042200002cb44a9a29ca) dm-2 IBM     ,1815      FASt
[size=100G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 3:0:0:0 sdb 8:16  [active][undef]
\_ round-robin 0 [prio=0][enabled]
 \_ 4:0:0:0 sdc 8:32  [active][undef]

The multiple paths are connected to the same LUN.

# /lib/udev/scsi_id -g -u -d /dev/sdb
3600a0b80002a042200002cb44a9a29ca
# /lib/udev/scsi_id -g -u -d /dev/sdc
3600a0b80002a042200002cb44a9a29ca

Here's the /etc/multipath.conf

defaults {
        udev_dir                /dev
        polling_interval        5
        selector                "round-robin 0"
        path_grouping_policy    failover
        getuid_callout          "/lib/udev/scsi_id -g -u -d /dev/%n"
        path_checker            readsector
        failback                immediate
        user_friendly_names     yes
}
multipaths {
        multipath {
                wwid    3600a0b80002a042200002cb44a9a29ca
                alias   san01
        }
}

fdisk -l

Disk /dev/sdb: 107.3 GB, 107374182400 bytes
255 heads, 63 sectors/track, 13054 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x61b4bf95

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1       13054   104856223+  83  Linux

Disk /dev/sdc: 107.3 GB, 107374182400 bytes
255 heads, 63 sectors/track, 13054 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x61b4bf95

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1       13054   104856223+  83  Linux

I increased the verbosity for lpfc and now I get the following on dmesg:

[ 2519.241119] lpfc 0000:07:00.0: 1:0336 Rsp Ring 0 error: IOCB Data: xff000018 x37a120c0 x0 x0 xeb x0 x1b108db xa29b16
[ 2519.241124] lpfc 0000:07:00.0: 1:(0):0729 FCP cmd x12 failed <0/0> status: x1 result: xeb Data: x1b1 x8db
[ 2519.241127] lpfc 0000:07:00.0: 1:(0):0730 FCP command x12 failed: x0 SNS x0 x0 Data: x8 xeb x0 x0 x0
[ 2519.241130] lpfc 0000:07:00.0: 1:(0):0716 FCP Read Underrun, expected 254, residual 235 Data: xeb x12 x0
[ 2519.241275] lpfc 0000:07:00.0: 1:0336 Rsp Ring 0 error: IOCB Data: xff000018 x37a14c48 x0 x0 xd2 x0 x1b208e6 xa29b16
[ 2519.241279] lpfc 0000:07:00.0: 1:(0):0729 FCP cmd x12 failed <0/0> status: x1 result: xd2 Data: x1b2 x8e6
[ 2519.241283] lpfc 0000:07:00.0: 1:(0):0730 FCP command x12 failed: x0 SNS x0 x0 Data: x8 xd2 x0 x0 x0
[ 2519.241286] lpfc 0000:07:00.0: 1:(0):0716 FCP Read Underrun, expected 254, residual 210 Data: xd2 x12 x0

Can someone see anything wrong with this config?
Thank you.


Based on janneb's comments I changed the configuration in multipath.conf to:

defaults {
        udev_dir                /dev
        polling_interval        5
        selector                "round-robin 0"
        path_grouping_policy    multibus
        getuid_callout          "/lib/udev/scsi_id -g -u -d /dev/%n"
        failback                immediate
        user_friendly_names     yes
}

Which now gives:

san01 (3600a0b80002a042200002cb44a9a29ca) dm-2 IBM     ,1815      FASt
[size=100G][features=0][hwhandler=0]
\_ round-robin 0 [prio=2][active]
 \_ 3:0:0:0 sdb 8:16  [active][ready]
 \_ 4:0:0:0 sdc 8:32  [active][ready]

But it still goes [active][undef] after a while, then back to [ready].

Oh I just noticed something, when I run 'multipath -l' I get [undef], however if I run 'multipath -ll' I get [ready].

-l     show the current multipath topology from information fetched in sysfs and the device mapper
-ll    show the current multipath topology from all available information (sysfs, the device mapper, path checkers ...)

Is the setup wrong? How can I debug? Thanks.


Thank you janneb and zerolagtime for helping out.

Here's how it gets complicated, I thought I would not need to explain all this, and I am currently leaning towards hardware setup mixup.

There are actually two servers connected to the same LUN using FC. On the OS level only one server would access the filesystem (although the same LUN is exposed to both) , since it is ext3 (not a clustering filesystem). If server 1 goes down, server 2 kicks in (linux-ha) and mounts the filesystem.

Server 1 (multipath -ll):

san01 (3600a0b80002a042200002cb44a9a29ca) dm-2 IBM     ,1815      FASt
[size=100G][features=0][hwhandler=0]
\_ round-robin 0 [prio=2][active]
 \_ 3:0:0:0 sdb 8:16  [active][ready]
 \_ 4:0:0:0 sdc 8:32  [active][ready]

Server 2 (multipath -ll):

san01 (3600a0b80002a042200002cb44a9a29ca) dm-2 IBM     ,1815      FASt
[size=100G][features=0][hwhandler=0]
\_ round-robin 0 [prio=2][active]
 \_ 3:0:0:0 sdb 8:16  [active][ready]
 \_ 4:0:0:0 sdc 8:32  [active][ready

Server 1 port names:

# cat /sys/class/fc_host/host3/port_name 
0x10000000c96c5fdb
# cat /sys/class/fc_host/host4/port_name 
0x10000000c96c5df5
root@web-db-1:~# 

Server 2 port names:

#cat /sys/class/fc_host/host3/port_name 
0x10000000c97b0917
# cat /sys/class/fc_host/host4/port_name 
0x10000000c980a2d8

Is this setup wrong? Is the way that the LUN exposed to both server wrong? I am thinking that the hardware hookup is incorrect, what could be wrong? Could server1 path_checker interfering with server2's operation?
Thanks.

Best Answer

Your configuration looks weird; normally you'd have 4 paths to the same device (that is, 4 /dev/sdX devices per multipath device). The array controller typically is able to inform the host about the priority for each path, so you have 2 paths with higher priority and 2 with lower priority. Then dm-multipath multiplexes IO over the 2 high priority paths (the "selector" option with the default rr_min_io=100). Now, you have 2 path groups both with the same prioruty, so maybe dm-multipath is spreading IO over both of them, which might not be what your SAN admin wants you to do. Another weird thing is that the devices are marked with "undef" rather than "ready". Yet another strange thing is that your path numbering looks quite weird (everything goes along the same path?). Are you really sure everything is properly cabled together, properly zoned etc.?

A typical output from "multipath -ll" should look like

sanarch3 (3600508b4000683de0000c00000a20000) dm-6 HP,HSV200
[size=2.0T][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=100][active]
 \_ 0:0:0:5 sdc 8:32  [active][ready]
 \_ 1:0:0:5 sdk 8:160 [active][ready]
\_ round-robin 0 [prio=20][enabled]
 \_ 0:0:1:5 sdg 8:96  [active][ready]
 \_ 1:0:1:5 sdo 8:224 [active][ready]

There you see 4 paths grouped into 2 priority groups, and IO is done over devices sdc and sdk while sdg and sdo are idle and used only during a failure.

EDIT So the reason why you should see 4 paths is that you have 2 HBA ports and the array has 2 redundant controllers. Then you have 2 redundant networks with a final switch layer providing cross-network connections. Thus both HBA's see both controllers, hence 4 paths for each LUN. You can see that in my example above for the SCSI ID numbering, which goes as [host controller ID]:[channel ID]:[target controller ID]:[LUN ID]. What you then can see above is that the active paths are both on controller #0, since in this case controller #0 happens to "own" the LUN; IO is possible via the other controller but at a performance penalty since the other controller would (depending on the controller implementation) need to forward the IO to the owning controller. Hence the controller reports that the paths that go to controller #0 have higher priority.

So from your question one sees that there is no path to the other controller at all. And, in case you don't have redundant controllers and networks, why bother with multipath in the first place?