No, the algorithm is not that simplistic. You can find more information in:
http://linux-mm.org/OOM_Killer
If you want to track memory usage, I'd recommend running a command like:
ps -e -o pid,user,cpu,size,rss,cmd --sort -size,-rss | head
It will give you a list of the processes that are using the most memory (and probably causing the OOM situation). Remove the | head
if you'd prefer to check all the processes.
If you put this on your cron, repeat it every 5 minutes and save it to a file. Keep at least a couple of days, so you can check what happened later.
For critical services like ssh, I'd recommend using monit for auto restarting them in such a situation. It might save from losing access to the machine if you don't have a remote console to it.
Best of luck,
João Miguel Neves
I'm new to ServerFault and just saw this post. It seems to have resurfaced near the front of the queue even though it is old. Let's put this scary one to bed maybe?
First of all, I have an interest in this topic as I am optimizing systems with limited RAM to run many user processes in a secure way.
It is my opinion that the error messages in this log are referring to OpenVZ Linux containers.
A "ve" is a virtual environment and also known as a container in OpenVZ. Each container is given an ID and the number you are seeing is that ID. More on this here:
https://openvz.org/Container
The term "free" refers to free memory in bytes at that moment in time. You can see the free memory gradually increasing as processes are killed.
The term "gen" I am a little unsure of. I believe this refers to generation. That is, it starts out at 1 and increases by one for every generation of a process in a container. So, for your system, it seems there were 24K+ processes executed since boot. Please correct me if I'm wrong. That should be easy to test.
As to why it killed processes, that's because of your OOM killer configuration. It's trying to bring the free memory back to the expected amount (which looks to be 128 Kb). Oracle has a good write-up of how-to configure this to something you might like better:
http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html
Additionally, if you'd like to see the memory configuration for each of your containers, check this out:
https://openvz.org/Setting_UBC_parameters
Best Answer
How do you define the "cause" of the OOM situation? Is it the process using the most memory? Perhaps you have a DB that always takes 3GB of memory to run and thus uses the most memory on the machine. Is it the "cause" of the problem? Probably not.
Ultimately the cause of the problem is "An unexpected situation which may or may not have been the fault of the sysadmin."
Sometimes you can know; for instance if you had process accounting setup (+1 to @JamesHannah) and you saw 3000 httpd or sshd processes (and that was unusual) you could probably blame that daemon.
With that in mind, I present comments from The Source:
"So the ideal candidate for liquidation is a recently started, non privileged process which together with its children uses lots of memory, has been nice'd, and does no raw I/O. Something like a nohup'd parallel kernel build (which is not a bad choice since all results are saved to disk and very little work is lost when a 'make' is terminated)."
Comment block and quote shameless stolen from http://linux-mm.org/OOM_Killer