Linux – Limiting memory usage and minimizing swap thrashing on Unix / Linux


I have a few machines that I machine that I use for running large numbers of jobs where I try to limit the number of jobs so as not to exceed the available RAM of the machine. Occasionally I mis-estimate how much memory some of the jobs will take, and the machine starts thrashing the swap file. I resolve this by sending the kill -s STOP to one of the jobs so that it can get swapped out.

Does anyone know of a utility that will monitor a server for processes by a specific name, and then pause the one with the smallest memory footprint is the total memory consumption reaches a desired threshold so that the larger ones can run and complete with a minimum of swap file thrashing? Paused processes then need to be resumed once some existing processes have completed.

Best Answer

Have a look at thrash-protect (daemon written in Python)

  • it doesn't do exactly what you want, but it does identify swap-thrashing behaviour in whole server, then attempt to identify processes causing this, and sends them a STOP signal to freeze them.
  • then later on it sends a CONT signal to unfreeze them.

If you don't want to use this as-is, it could be the basis for a custom script that pauses the process with smallest memory footprint.

Doing it by hand

For those who don't like an automated tool, you can just use htop to find the processes with highest percentage memory use, and send them the STOP signal with kill -s STOP <pid>. Then later you can send kill -s CONT <pid> to resume them. However, you may have to wait a long time for the htop and kill to execute when server is thrashing.

The problem

When a machine is thrashing due to memory pressure, you will usually see very poor throughput despite low CPU utilization.

To diagnose this:

  • does it take tens of seconds, or even minutes, to run a simple shell command?
  • in top or htop, is there high use of swap, high load average and low CPU usage?
  • are there high swapin / swapout rates? Run vmstat 1 and look for si and so values, particularly 2 to 4 digit numbers every second, with no zero-swap seconds.
Related Topic