Linux – fork: Resource temporarily unavailable running JVM


I'm running a Tomcat 6 instance on a 34 GB EC2 instance. I've been struggling to keep the memory down but this thing services a lot of requests and the heap frequently gets up to 13 GB. But the heap is another story.

The real problem right now is that after awhile the server stops responding and console commands are met with a "fork: Resource temporarily unavailable" message.

Since the server goes down hard at this point and nothing is on the EC2 or ssh console I don't know how to diagnose this. After restarting and leaving up for awhile, top looks like this:

Mem:  35847580k total, 28719420k used,  7128160k free,   221432k buffers
Swap:        0k total,        0k used,        0k free, 11103780k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                   
 xxxx tomcat    25   0 19.9g  15g 9832 S   86 44.1  36:01.69 java        

I'm pretty sure I have my ulimits set high enough and nothing in /etc/security.conf that would limit the Java process. I've got about 30,000 threads and an equal number of FDs. Nothing either in syslog besides some SYN flodding messages (these happen when the JVM GC's and we're under heavy load)

Anything else I should look at? ( btw)

Best Answer

Sounds an awful lot like you're running out of memory. fork() will basically only fail because of ulimit limits (number of process or file descriptors) or lack of memory. So if you're not hitting your ulimits, that means you're out of memory.

root is usually excempt from limits such as max # of processes but check your limits.conf to be sure. Depending on your EC2 setup though, you might not be able to login directly as root, so in that case you'll probably have to keep a root shell open on the box...

A system in trouble may not be able to log to disk so the only way to know what's going on is probably through "dmesg" (which prints the kernel's ring buffer). Try keeping a root shell open on the box with the following running:

while true ; do dmesg -c ; sleep 0.1 ; done

Also, keeping a vmstat 1 running might reveal something interesting like e.g. heavy swapping...

Did you grep your syslog for "oom-killer" ?