Why higher await time for DM multipath device than underlying device

device-mapperiowaitperformance

We have a CentOS 6.4 based server attached to Hitachi HNAS 3080 storage and observed the kernel remount the filesystem in read-only mode:

May 16 07:31:03 GNS3-SRV-CMP-001 kernel: [1259725.675814] EXT3-fs (dm-1): error: remounting filesystem read-only

This happened after several I/O errors and all paths to the device reportedly going down:

May 16 07:31:03 GNS3-SRV-CMP-001 multipathd: mpatha: remaining active paths: 0

I have been looking at sar logs and can see few very large (2 seconds) await times:

07:40:00       dev8-0     17.91    112.04     98.03     11.73      0.00      0.20      0.07      0.12
07:40:00      dev8-16      0.23      1.85      0.00      8.00      0.00      3.71      3.71      0.09
07:40:00      dev8-32     91.50   8338.76   5292.93    148.98      8.38     91.60      9.76     89.35
07:40:00     dev252-0     91.27   8336.91   5292.93    149.34     17.79    194.88      9.79     89.38
07:40:00     dev252-1    674.80   8168.16   5292.93     19.95   1473.53   2183.60      1.32     88.98

The duration between 07:30:00-07:40:00 does happen the time when the filesystem got mounted read-only. However, even under normal conditions, one repeated observation is that the await time for underlying devices is much lower than that of the multipath device. For instance:

00:00:00          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
00:10:00       dev8-0     19.27    129.41     78.61     10.80      0.01      0.27      0.16      0.32
00:10:00      dev8-16      0.23      1.80      0.00      8.00      0.00      0.86      0.84      0.02
00:10:00      dev8-32     94.88  10285.16   3363.48    143.86      3.39     35.76      6.83     64.82
00:10:00     dev252-0     94.65  10283.34   3363.48    144.18      3.64     38.47      6.86     64.89
00:10:00     dev252-1    435.06  10087.12   3363.48     30.92    118.42    272.21      1.47     64.12

dev8-0 happens to be the local disk, while dev8-16 (/dev/sdb) and dev8-32 (/dev/sdc) are the underlying ones for dev252-0 (/dev/mapper/mpatha). dev252-1 (/dev/mapper/mpathap1) is a single partition spanning the whole of the multipath device. Here is output from multipath -ll:

mpatha (2521501cbffffffffe96773b50ec30020) dm-0 BlueArc,NAS Platform
size=10T features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=1 status=enabled
| `- 9:0:0:0 sdc 8:32 active ready running
`-+- policy='round-robin 0' prio=1 status=active
  `- 8:0:0:0 sdb 8:16 active ready running

Why should the await time for /dev/mapper/mpathap1 be so much higher than that of /dev/mapper/mpatha or even /dev/sdb or /dev/sdc?

Best Answer

As user the-wabbit suggests, there is request-merging going on. You can see that in the column avgrq-sz, the average request size - which shows a significant increase.

Now 'await' is the time spent in the queue plus the time spent servicing those requests. If a small request, let's call it 'x', is merged with a couple of other requests (y and z, issued after x), then x will

wait in the queue to be merged with y
wait in the queue tu be merged with z
wait for (x,y,z) to be completed

This will obviously have a negative impact on the await statistic, mostly because of the way await is calculated, without actually signifying a problem in itself.

Now let's take a look at /dev/sdb (dev8-16). Did you know that you are not using that path? You have two priority groups in your multipath config, one is

status=enabled

and on is

status=active

You probably have

path_grouping_policy    failover

in your configuration (which is the default).

If you want to prevent the IO errors in case both paths are down, you could try:

        features        "1 queue_if_no_path"

in your multipath.conf

Now the real question remains, why do both paths go down?

Best Answer

Related Solutions

Does iowait Include Time Waiting for Network Calls?

Is virtual machine slower than the underlying physical machine

Related Topic