Linux – High server load – [jbd2/md1-8] using 99.99% IO

high-loadlinuxsoftware-raid

I've been having spike in load over the last week. This usually occurs once or twice a day. I've managed to identify from iotop that [jbd2/md1-8] is using 99.99 % IO. During the high load times there is no high traffic to the server.

Server specs are:

  • AMD Opteron 8 core
  • 16 GB RAM
  • 2×2.000 GB 7.200 RPM HDD Software Raid 1
  • Cloudlinux + Cpanel
  • Mysql is properly tuned

Apart from the spikes, the load usually is around 0.80 at most.

I've searched around but can't find what [jbd2/md1-8] does exactly. Has anyone had this problem or does anyone know a possible solution?

Thank you.

UPDATE:

TIME        TID     PRIO     USER    DISK READ    DISK WRITE    SWAPIN  IO       COMMAND
16:05:36     399     be/3    root    0.00 B/s      38.76 K/s    0.00 %  99.99 %  [jbd2/md1-8]

Best Answer

This is not really an answer as there is not enough context to give the exact cause, but it is a description of how I managed to track this when it happened to me.

I noticed my jbd2/md0-8 kept showing up at the top of iotop. I looked in /sys/kernel/debug/tracing/events/jbd2 to see what options there are to determine what jbd2 was doing.

NOTE-1: To see output for debug tracing events cat /sys/kernel/debug/tracing/trace_pipe - I had this running in terminal while enabling/disabling traces.

NOTE-2: To enable events for tracing use e.g. echo 1 > /sys/kernel/debug/tracing/events/jbd2/jbd2_run_stats/enable. To disable echo 0 > /sys/kernel/debug/tracing/events/jbd2/jbd2_run_stats/enable.

I started by enabling /sys/kernel/debug/tracing/events/jbd2/jbd2_run_stats/enable - but there was nothing that seemed particularly interesting in the output for it. I tried a few other events to trace and when I enabled /sys/kernel/debug/tracing/events/jbd2/jbd2_commit_flushing/enable I saw it was occurring every second:

# cat /sys/kernel/debug/tracing/trace_pipe
...
jbd2/md0-8-2520  [004] .... 658660.216492: jbd2_commit_flushing: dev 9,0 transaction 32856413 sync 0
jbd2/md0-8-2520  [001] .... 658661.334900: jbd2_commit_flushing: dev 9,0 transaction 32856414 sync 0
jbd2/md0-8-2520  [001] .... 658661.394113: jbd2_commit_flushing: dev 9,0 transaction 32856415 sync 0

This looked like it was related to sync(2)/fsync(2)/msync(2), so I looked for some way to link this to a process and found this:

# find /sys/kernel/debug/tracing/events/ | grep sync.*enable
...
/sys/kernel/debug/tracing/events/ext4/ext4_sync_file_enter/enable
...

When I enabled it I saw the following output:

# cat /sys/kernel/debug/tracing/trace_pipe
...
      nzbget-17367 [002] .... 658693.222288: ext4_sync_file_enter: dev 9,0 ino 301924373 parent 301924357 datasync 1 
  jbd2/md0-8-2520  [001] .... 658693.284080: jbd2_commit_flushing: dev 9,0 transaction 32856465 sync 0
      nzbget-17367 [000] .... 658693.334267: ext4_sync_file_enter: dev 9,0 ino 301924357 parent 301924353 datasync 1 
  jbd2/md0-8-2520  [002] .... 658693.334275: jbd2_commit_flushing: dev 9,0 transaction 32856466 sync 0
      nzbget-17367 [001] .... 658694.369514: ext4_sync_file_enter: dev 9,0 ino 301924367 parent 301924357 datasync 1 
  jbd2/md0-8-2520  [002] .... 658694.414861: jbd2_commit_flushing: dev 9,0 transaction 32856467 sync 0
      nzbget-17367 [001] .... 658694.470872: ext4_sync_file_enter: dev 9,0 ino 301924357 parent 301924353 datasync 1 
  jbd2/md0-8-2520  [002] .... 658694.470880: jbd2_commit_flushing: dev 9,0 transaction 32856468 sync 0

This gave me the process name/id - and after doing some more debugging of this process (nzbget) I discovered it was doing fsync(2) every second. After I changed its config (FlushQueue=no, undocumented I think, found it in source) to stop it from doing this per second fsync(2) the problem went away.

My kernel version is 4.4.6-gentoo.I think there were some options I enabled (either manually or with make oldconfig) at some point in kernel config to get /sys/kernel/debug with these events - so if you don't have it maybe just look around the internet for more information on enabling it.

Related Topic