Ubuntu – debugging high load average due to slow io on ec2

I am on amazon ec2 ubuntu 11.04 large instance with a 150GB volume mounted for the database (ext4).

The cpu usage is VERY low but the load average has been consistently at 2.0 for about a day now. I used to have the database partition on a 40GB volume and did not have this problem.

iostat tells me we are spending a lot of time waiting for io:

:~$ iostat 1 2
Linux 2.6.38-11-virtual (flashgroup)    04/05/2012      _x86_64_        (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           7.16    0.09    2.62    1.11    2.09   86.92

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvdap1            3.45         0.88        18.59    9137065  192742888
xvdb              4.47         2.84        24.17   29479675  250638760
xvdh             10.62        19.95        88.05  206811124  912892410
xvdf              0.18         0.00         1.93       1378   19971464
xvdg              0.00         0.00         0.00        656          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.22    0.00    1.92   42.58    3.02   47.25

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvdap1            0.00         0.00         0.00          0          0
xvdb             43.00         0.00       172.00          0        172
xvdh              0.00         0.00         0.00          0          0
xvdf             49.00         0.00       288.00          0        288
xvdg              0.00         0.00         0.00          0          0

The product is performing just fine and the database is not logging any slow queries…

How should I go about debugging this?

EDIT:

It turns out that none of the volumes are exhibiting high latency and all other aspects of the system seem to be healthy. Wikipedia tells me that linux includes processes in the un-interruptable state in the load average. ps tells me that there are two hung mount commands are in such state:

ps auxww | grep " D"
root     21557  0.0  0.0   9904   760 ?        D    Apr03   0:00 umount db /dev/xvdh
root     26428  0.0  0.0  16456   912 ?        D    Apr03   0:00 mount /dev/xvdh /mnt/db

I am afraid to kill these (probably would not even work if I tried) so I think that this instance is sick and needs a restart. Thanks for your help!

Ubuntu – debugging high load average due to slow io on ec2

Best Answer

Related Topic

Best Answer

Related Solutions

What could be causing load spikes on this EC2 instance

Linux – Ubuntu 10.10 Maverick Server makes system locks up at random intervals (i7 930; 12GB RAM)

Related Topic