Linux – IO utilization percent of 4920.45% – iostat -x , What’s wrong

ioiostatlinux

I have seen wrong percent use of a disk IO some times on servers that have been a long time without reboot.

By any means this server has significant IO. Tonight it'll be rebooted and I'm sure tomorow we will have nice %use.
Uptime is 497 days.

root@xxxxxx:~# iostat -x 1
Linux 2.6.24-27-server (xxxxxx)         10/13/2011

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.55    0.00    0.30    7.54    0.00   91.60

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  **%util**

sda           17649.65   765.65 5478.46 5262.33 36185.00 48224.35     7.86    19.06    1.78   4.58 **4920.45**  

The only thing is that nagios sees this as critical.

Any explanation will be wellcomed.

Thank you in advance.


Added later:

As you can see, the statistics are 0, and the % is going down quite slowly.

Linux 2.6.24-27-server (xxxxxxx)         10/13/2011

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.55    0.00    0.30    7.54    0.00   91.61

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util sda           13578.30  
590.03 4214.71 4048.69 27838.04 37110.10     7.86    14.67    1.78   4.58 3785.44

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util sda               0.00    
0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util sda               0.00    
0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util sda               0.00    
0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util sda               0.00    
0.00    0.00   10.00     0.00    80.00     8.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util sda               0.00    
0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util sda               0.00    
0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Best Answer

The source code for iostat has a cut off of 100% on the calculation for %util. Either your version of iostat has some modification in this computation and it doesn't mean what it normally means or something very strange has happened.

Take a look at lines 381 and 382 in the iostat.c source:

            if (busy > 100.0)
                    busy = 100.0;

If you look down at lines 386 and 394, you can see that busy is what is printed as %util.