Linux freezes every few seconds

amazon ec2kernellinuxpostgresqlUbuntu

We're having an issue where one our Linux boxes (Ubuntu 10.04 LTS, running on EC2 with a quadruple-large size, 68GB of RAM and 8 virtual cores with 3.25GHz each) freezes up every few seconds. Typing in an ssh session will freeze, and running strace on one of the Postgresql processes that's running usually shows:

02:37:41.567990 semop(7831581, {{3, -1, 0}}, 1

for a few seconds before it proceeds (it always gets stuck at that semop).

OProfile shows that most of the time is spent in the kernel (60%) versus 37% in Postgresql.

The result of these halts (which began suddenly a day ago) is that load on the box has gone from 0.7 to 10+, and causes our entire stack to slow done.

Any ideas on how to track down what's going on? iostat doesn't show the disks being particularly slow or overloaded, and top shows user cpu % spike from 8% to about 40% whenever these back-ups happen.

Best Answer

I suspect your system is running out of semaphores. Check ipcs -l for current settings. Here's some info about tuning semaphores for postgresql. In particular I would try increasing the maximum number of semaphores system-wide (SEMMNS) and the maximum number of semaphores per set (SEMMSL). You can use sysctl -p to modify these settings.