Linux – Why does kswapd use 100% CPU with no swap space and plenty of cache available for reaping

cpu-usagelinuxswap

Firefox used a lot of memory and the machine ground to a near-halt with kswapd/kworker using most of the CPUs. There is no swap space, and vm.swappiness=0 on Linux 4.5.7 (Fedora 24).

What I don't understand is that with nearly 1.5GB of buff/cache, why didn't Linux reap that cache for Firefox/plugin-container? What is kswapd doing?

top - 13:17:15 up  2:47,  4 users,  load average: 9.78, 5.38, 2.35
Tasks: 197 total,   4 running, 193 sleeping,   0 stopped,   0 zombie
%Cpu(s):  5.8 us, 47.0 sy,  0.0 ni, 10.0 id, 36.9 wa,  0.0 hi,  0.3 si,  0.0 st
KiB Mem :  3922860 total,   105508 free,  2353620 used,  1463732 buff/cache
KiB Swap:        0 total,        0 free,        0 used.     6828 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
   49 root      20   0       0      0      0 R 100.0  0.0   2:35.25 kswapd0
 6395 kevin     20   0 1152968 371132   4292 R  31.7  9.5   3:16.59 plugin-containe
 3449 root      20   0       0      0      0 S  26.3  0.0   0:24.49 kworker/u16:3
 5885 root      20   0       0      0      0 S  23.8  0.0   0:34.12 kworker/u16:2
 4246 root      20   0       0      0      0 S  22.9  0.0   0:42.11 kworker/u16:4
 6236 root      20   0       0      0      0 R  19.0  0.0   0:38.84 kworker/u16:1
 4700 root      20   0       0      0      0 S  17.8  0.0   0:40.57 kworker/u16:5
 3473 kevin     20   0 1662688 402008    460 D   8.3 10.2   7:36.45 thunderbird
 1846 elastic+  20   0 4238960 401324    124 S   5.7 10.2   3:05.58 java
 6107 kevin     20   0 2133616 602096  20920 S   5.1 15.3   4:03.21 firefox...

I don't think I was doing anything I/O-write related recently so I wouldn't expect any dirty page flushes to disk (SSD), although wait is 37% which is a bit surprising. I grabbed about 30 seconds worth of top and buff/cache didn't change much so I don't think it's actually flushing any pages to disk (although then I don't understand why wait% is high):

$ grep -e "top -" -e "buff/cache" top.txt 
top - 13:17:11 up  2:47,  4 users,  load average: 9.41, 5.23, 2.29
KiB Mem :  3922860 total,   103468 free,  2353456 used,  1465936 buff/cache
top - 13:17:15 up  2:47,  4 users,  load average: 9.78, 5.38, 2.35
KiB Mem :  3922860 total,   105508 free,  2353620 used,  1463732 buff/cache
top - 13:17:21 up  2:47,  4 users,  load average: 10.44, 5.59, 2.43
KiB Mem :  3922860 total,   108700 free,  2354532 used,  1459628 buff/cache
top - 13:17:24 up  2:47,  4 users,  load average: 10.72, 5.73, 2.50
KiB Mem :  3922860 total,   107004 free,  2355112 used,  1460744 buff/cache
top - 13:17:43 up  2:47,  4 users,  load average: 12.64, 6.39, 2.77
KiB Mem :  3922860 total,   108264 free,  2352820 used,  1461776 buff/cache
top - 13:17:46 up  2:47,  4 users,  load average: 12.27, 6.42, 2.79
KiB Mem :  3922860 total,   108580 free,  2352584 used,  1461696 buff/cache

Killing firefox and plugin-container brought the system back to normal. I'd prefer that either the cache was fully flushed to give more headroom, or at least that the OOM killer ran in this condition instead of having to do Ctrl+Alt+F2 because KDE isn't responding, waiting an eternity for the login prompt and finally doing a pkill.

Best Answer

This is a superuser question not serverfault, but I had this myself on Fedora 24.

Its was caused by ffmpeg-libs, VDPAU and my GPU/Kernel. For me I disabled VDPAU in VLC to 'fix' it.

It appears as a ever increasing size of Shmem in /proc/meminfo and the affected process if you pmap it show hundreds of mappings for 'renderD128' and ever increasing.

Its got to be a implementation bug more than likely -- disable VDPAU output in your video processing applications.