What are some ways to debug I/O issues on a linux server?
I've been using:
# nohup top -b -d 10 > /var/log/top.log &
# nohup iotop -b -d 5 -o -t > /var/log/iotop.log &
PS: hardware is clean, new and fine.
SWAP is not being used at all and I see a lot of:
[jbd2/sda6-8]
[jbd2/sda2-8]
[loop0]
[loop1]
[events/0]
[flush-8:0]
[kondemand/3]
[ksoftirqd/3]
[kblockd/2]
The server will be fine for most of the time then it will randomly spike into 6.00~38.00 Load Average.
All I have on the box is PHP/Apache/nginx.
Example:
top - 03:25:11 up 1 day, 5:00, 3 users, load average: 6.87, 2.98, 1.90
Tasks: 224 total, 1 running, 222 sleeping, 0 stopped, 1 zombie
Cpu0 : 4.7%us, 1.0%sy, 0.0%ni, 21.3%id, 73.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 15.0%us, 2.3%sy, 0.0%ni, 60.0%id, 22.7%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 6.7%us, 1.7%sy, 0.0%ni, 0.0%id, 91.3%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu3 : 0.0%us, 0.0%sy, 0.0%ni, 91.1%id, 8.6%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 8031932k total, 7971176k used, 60756k free, 231236k buffers
Swap: 8191992k total, 0k used, 8191992k free, 6334420k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2231 mysql 20 0 2576m 537m 6348 S 3.0 6.9 66:35.85 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --us
678511 user 20 0 245m 43m 20m D 1.0 0.6 0:01.08 /usr/bin/php
678539 user 20 0 255m 49m 21m D 0.7 0.6 0:00.33 /usr/bin/php
678551 user 20 0 230m 14m 8392 D 0.7 0.2 0:00.08 /usr/bin/php
678565 user 20 0 231m 17m 10m D 0.7 0.2 0:00.08 /usr/bin/php
36 root 20 0 0 0 0 S 0.3 0.0 1:04.45 [kblockd/2]
60 root 20 0 0 0 0 S 0.3 0.0 0:51.02 [kswapd0]
1653 root 20 0 0 0 0 S 0.3 0.0 0:54.87 [kondemand/2]
3394 root 20 0 353m 3480 1496 S 0.3 0.0 7:26.66 /usr/sbin/db_governor
494915 nobody 18 -2 61104 19m 988 S 0.3 0.2 0:38.74 nginx: worker process
678473 nobody 20 0 96912 13m 2304 S 0.3 0.2 0:00.04 /usr/local/apache/bin/httpd -k start -DSSL
678474 nobody 20 0 96904 13m 2304 S 0.3 0.2 0:00.04 /usr/local/apache/bin/httpd -k start -DSSL
678480 user 20 0 229m 17m 10m S 0.3 0.2 0:00.22 /usr/bin/php
678491 root 20 0 15148 1360 944 R 0.3 0.0 0:00.15 top -c
678519 user 20 0 233m 30m 20m D 0.3 0.4 0:00.22 /usr/bin/php
678538 user 20 0 234m 31m 20m D 0.3 0.4 0:00.18 /usr/bin/php
678567 user 20 0 230m 14m 8392 D 0.3 0.2 0:00.06 /usr/bin/php
678612 user 20 0 128m 6156 4392 D 0.3 0.1 0:00.01 /usr/bin/php
1 root 20 0 19356 1388 1064 S 0.0 0.0 0:00.89 /sbin/init
and ittop
66913 be/4 user 1733.28 K/s 0.00 B/s 0.00 % 99.99 % php
66888 be/4 user 734.51 K/s 0.00 B/s 0.00 % 99.99 % php
66275 be/4 user 167.11 K/s 0.00 B/s 0.00 % 99.99 % php
66409 be/4 user 956.03 K/s 0.00 B/s 0.00 % 99.99 % php
66840 be/4 user 15.55 K/s 0.00 B/s 0.00 % 99.99 % php
66825 be/4 user 85.50 K/s 0.00 B/s 0.00 % 99.99 % php
66902 be/4 user 2028.64 K/s 0.00 B/s 0.00 % 99.99 % php
66268 be/4 user 932.71 K/s 0.00 B/s 0.00 % 99.95 % php
66805 be/4 user 489.67 K/s 0.00 B/s 0.00 % 93.08 % php
This is what randomly will spike.
Ideas?
Best Answer
Thanks for the question.
It would be helpful to have detailed information about the hardware you're using.
That includes the server make/model, the disk array setup (RAID controller, RAID level, caching solution, # of disks) and the details of your Linux distribution and kernel.
Looking at the data dump above, I suspect I/O wait from write activity that is starved or waiting for resources. That can happen when there's no write cache available on the disk array. That can also be the cause of the wild swings in load.
The output of a tool like iostat or collectl will be more helpful in understanding what's happening.
Try
iostat -x 1
orcollectl -sD
and post the result.