I have a few machines that I machine that I use for running large numbers of jobs where I try to limit the number of jobs so as not to exceed the available RAM of the machine. Occasionally I mis-estimate how much memory some of the jobs will take, and the machine starts thrashing the swap file. I resolve this by sending the kill -s STOP
to one of the jobs so that it can get swapped out.
Does anyone know of a utility that will monitor a server for processes by a specific name, and then pause the one with the smallest memory footprint is the total memory consumption reaches a desired threshold so that the larger ones can run and complete with a minimum of swap file thrashing? Paused processes then need to be resumed once some existing processes have completed.
Best Answer
Have a look at thrash-protect (daemon written in Python)
If you don't want to use this as-is, it could be the basis for a custom script that pauses the process with smallest memory footprint.
Doing it by hand
For those who don't like an automated tool, you can just use
htop
to find the processes with highest percentage memory use, and send them the STOP signal withkill -s STOP <pid>
. Then later you can sendkill -s CONT <pid>
to resume them. However, you may have to wait a long time for the htop and kill to execute when server is thrashing.The problem
When a machine is thrashing due to memory pressure, you will usually see very poor throughput despite low CPU utilization.
To diagnose this:
top
orhtop
, is there high use of swap, high load average and low CPU usage?vmstat 1
and look forsi
andso
values, particularly 2 to 4 digit numbers every second, with no zero-swap seconds.