Linux – OOM killer goes insane – Valuable Tech Notes

On our cluster we would sometimes have nodes go down when a new process would request too much memory. I was puzzled why the OOM killer does not just kill the guilty process.

The reason turned out to be that some processes get -17 oom_adj. That makes them off-limits for OOM killer (unkillabe!).

I can clearly see that with the following script:

#!/bin/bash
for i in `grep -v 0 /proc/*/oom_adj | awk -F/ '{print $3}' | grep -v self`; do
  ps -p $i | grep -v CMD
done

OK, it makes sense for sshd, udevd, and dhclient, but then I see regular user processes get -17 as well. Once that user process causes an OOM event it will never get killed. This causes OOM kiler to go insane. NFS rpc.statd, cron, everything that happened to to be not -17 will be wiped out. As a result the node is down.

I have Debian 6.0 (Linux 2.6.32-3-amd64).

Does anyone know where to contorl the -17 oom_adj assignment behaviour?

Could launching sshd and Torque mom from /etc/rc.local be causing the overprotective behaviour?

[i-180ae177] root@migrantgeek ~ # pgrep mysqld_safe 11395 [i-180ae177] root@migrantgeek ~ # cat /proc/11395/oom_adj 0 [i-180ae177] root@migrantgeek ~ # for pid in `pgrep bash`; do echo -17 > /proc/$pid/oom_adj; done [i-180ae177] root@migrantgeek ~ # /etc/init.d/mysqld restart Stopping MySQL: [ OK ] Starting MySQL: [ OK ] [i-180ae177] root@migrantgeek ~ # pgrep mysqld_safe 11523 [i-180ae177] root@migrantgeek ~ # cat /proc/11523/oom_adj -17

Linux – OOM killer goes insane

Best Answer

Related Topic

Best Answer

Related Solutions

Linux – Turn off the Linux OOM killer by default

Linux – Forensic Analysis of the OOM-Killer

Related Topic