Nginx – Ruby on Rails Process Stuck at 100% CPU

nginxphusion-passengerUbuntu

Environment: Ubuntu 10.04 LTS, Passenger, Nginx 1.0.6, MySQL, Ruby 1.9.2, Rails 3.1

After some amount of time, the server ends up with a gradually increasing number of processes that are stuck at 100% CPU

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
 2393 avitus    20   0  496m 381m 1392 R  100  9.4  25:10.74 Rack: /home/web ...

Running a strace on any of the stuck PID's gives the following:

Process 2393 attached with 3 threads - interrupt to quit
[pid  2396] futex(0x8ca80e4, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid  2394] restart_syscall(<... resuming interrupted call ...>) = -1 ETIMEDOUT (Connection timed out)
[pid  2394] gettimeofday({1322590778, 346573}, NULL) = 0
[pid  2394] futex(0x821db60, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  2394] clock_gettime(CLOCK_REALTIME, {1322590778, 346885177}) = 0
[pid  2394] futex(0x821db84, FUTEX_WAIT_PRIVATE, 33872659, {0, 9687823}) = -1 ETIMEDOUT (Connection timed out)
[pid  2394] gettimeofday({1322590778, 356921}, NULL) = 0
[pid  2394] futex(0x821db60, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  2394] clock_gettime(CLOCK_REALTIME, {1322590778, 357196244}) = 0
[pid  2394] futex(0x821db84, FUTEX_WAIT_PRIVATE, 33872661, {0, 9724756}) = -1 ETIMEDOUT (Connection timed out)
[pid  2394] gettimeofday({1322590778, 367240}, NULL) = 0
[pid  2394] futex(0x821db60, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  2394] clock_gettime(CLOCK_REALTIME, {1322590778, 367459723}) = 0
[pid  2394] futex(0x821db84, FUTEX_WAIT_PRIVATE, 33872663, {0, 9780277}) = -1 ETIMEDOUT (Connection timed out)
[pid  2394] gettimeofday({1322590778, 377586}, NULL) = 0
[pid  2394] futex(0x821db60, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  2394] clock_gettime(CLOCK_REALTIME, {1322590778, 377807840}) = 0
[pid  2394] futex(0x821db84, FUTEX_WAIT_PRIVATE, 33872665, {0, 9778160}) = -1 ETIMEDOUT (Connection timed out)
[pid  2394] gettimeofday({1322590778, 387932}, NULL) = 0
[pid  2394] futex(0x821db60, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  2394] clock_gettime(CLOCK_REALTIME, {1322590778, 388162450}) = 0
[pid  2394] futex(0x821db84, FUTEX_WAIT_PRIVATE, 33872667, {0, 9769550}) = -1 ETIMEDOUT (Connection timed out)

Including the 'c' flag for strace gives:

Process 2393 attached with 3 threads - interrupt to quit 
Process 2393 detached Process 2394 detached 
Process 2396 detached 
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 94.97    0.003172           2      1489       744 futex
  3.74    0.000125           0       745           clock_gettime
  1.29    0.000043           0       745           gettimeofday
  0.00    0.000000           0         1         1 restart_syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.003340                  2980       745 total

I can kill -9 the stuck processes and the application and server appear to carry on happily. I've run out of ideas on how to proceed with debugging so if anyone has any advice as to the cause or other avenues of investigation it would be great to hear.

Best Answer

Try setting passenger_spawn_method to conservative in Passenger. I'm having this issue with Mongo and came across:

http://code.google.com/p/phusion-passenger/issues/detail?id=684

and:

https://github.com/rails/rails/issues/1339

I don't know why it's not working, but hopefully that will get you going if you haven't figured out the solution already.