Linux – Determine what’s causing high disk I/O

cacheiolinuxvirtualizationvps

I have a problem with my VPS and disk I/O. My server is running nginx + PHP-FPM + APC. The database is located on another dedicated VPS. I have several WordPress MU sites living on the web server. The average I/O rate is 6k block/second.

I'm trying to understand what's causing the high I/O.

Output of 'free -m':

            total   used   free   shared   buffers   cached
Mem:         1005    973     31        0        96      568
-/+ buffers/cache:   307    697
Swap:         255      8    247

Output of 'iotop':

Total DISK READ: 0.00 B/s | Total DISK WRITE: 3.90 M/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
 2150 be/4 root        0.00 B/s    0.00 B/s  0.00 % 65.25 % [flush-202:0]
 6694 be/4 www-data    0.00 B/s   19.64 K/s  0.00 %  0.00 % php-fpm: pool www
 6700 be/4 www-data    0.00 B/s   23.56 K/s  0.00 %  0.00 % php-fpm: pool www
 8646 be/4 www-data    0.00 B/s  424.12 K/s  0.00 %  0.00 % php-fpm: pool www
10974 be/4 www-data    0.00 B/s   19.64 K/s  0.00 %  0.00 % php-fpm: pool www

The 'flush-202:0' process sometimes hits I/O of 99%. I've read this is the disk cache flushing process, but what causes it to run and how do I fix it?

Best Answer

I'm not sure that iotop sample shows something unusual. It's not a problem for the flush process to be a high percentage of your I/O at any point in time if there isn't much I/O going on at that time.

I would install atop, which can present real-time data like iotop, but has the advantage of also logging samples throughout the day. A day after installing it, I would open the logged data with atop -r log_filename, then go through the samples with t until I found times when the I/O reported in the system-level output is high. Then I would switch the per-process output to disk with d to see what processes were generating the I/O activity.