I've been having spike in load over the last week. This usually occurs once or twice a day. I've managed to identify from iotop that [jbd2/md1-8] is using 99.99 % IO. During the high load times there is no high traffic to the server.
Server specs are:
- AMD Opteron 8 core
- 16 GB RAM
- 2×2.000 GB 7.200 RPM HDD Software Raid 1
- Cloudlinux + Cpanel
- Mysql is properly tuned
Apart from the spikes, the load usually is around 0.80 at most.
I've searched around but can't find what [jbd2/md1-8] does exactly. Has anyone had this problem or does anyone know a possible solution?
Thank you.
UPDATE:
TIME TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
16:05:36 399 be/3 root 0.00 B/s 38.76 K/s 0.00 % 99.99 % [jbd2/md1-8]
Best Answer
This is not really an answer as there is not enough context to give the exact cause, but it is a description of how I managed to track this when it happened to me.
I noticed my
jbd2/md0-8
kept showing up at the top ofiotop
. I looked in/sys/kernel/debug/tracing/events/jbd2
to see what options there are to determine whatjbd2
was doing.NOTE-1: To see output for debug tracing events
cat /sys/kernel/debug/tracing/trace_pipe
- I had this running in terminal while enabling/disabling traces.NOTE-2: To enable events for tracing use e.g.
echo 1 > /sys/kernel/debug/tracing/events/jbd2/jbd2_run_stats/enable
. To disableecho 0 > /sys/kernel/debug/tracing/events/jbd2/jbd2_run_stats/enable
.I started by enabling
/sys/kernel/debug/tracing/events/jbd2/jbd2_run_stats/enable
- but there was nothing that seemed particularly interesting in the output for it. I tried a few other events to trace and when I enabled/sys/kernel/debug/tracing/events/jbd2/jbd2_commit_flushing/enable
I saw it was occurring every second:This looked like it was related to
sync(2)
/fsync(2)
/msync(2)
, so I looked for some way to link this to a process and found this:When I enabled it I saw the following output:
This gave me the process name/id - and after doing some more debugging of this process (
nzbget
) I discovered it was doingfsync(2)
every second. After I changed its config (FlushQueue=no
, undocumented I think, found it in source) to stop it from doing this per secondfsync(2)
the problem went away.My kernel version is
4.4.6-gentoo
.I think there were some options I enabled (either manually or withmake oldconfig
) at some point in kernel config to get/sys/kernel/debug
with these events - so if you don't have it maybe just look around the internet for more information on enabling it.