Php – debugging stuck apache/php thread on production server

Apache2dumpPHPthreadstroubleshooting

I have a linux system with apache httpd and PHP which is loaded using LoadModule php5_module /usr/lib/apache2/modules/libphp5.so.

I've enabled the mod_status module of apache and I see a particular thread which is stuck doing something since yesterday. I also confirm this by doing ps -axu | grep apache which among the many threads it gives me that particular stuck thread:

www-data  5636  0.0  0.1 423556 23560 ?      S    XXXXX   0:04 /usr/sbin/apache2 -k start

Note that XXXXX is something like Jan02 which is yesterday. Also, the pid (5636) matches the pid of the stuck thread I see in the mod_status page of apache.

My question is: how can I do a thread dump or something similar in order to see where exactly in the PHP code this thing is stuck? Maybe it's waiting for something (i/o, network, db) but I don't know what.

In the java world I'd do a kill -3 pid and get a nice readable thread dump which would clearlly show me where exactly that particular thread is stuck at. Is there a similar technique for the php land?

Best Answer

The following instructions are Linux-centric:

  • Identify the faulty / stuck process

In your case, the process is in state S, meaning from man ps:

S interruptible sleep (waiting for an event to complete)

So yes, it is probably waiting for some network or filesystem operation to complete.

  • Trace system calls and signals with strace

Attach the strace program to the hanging thread by running:

# strace -p 

This will show you, in real time, the actions or more precisely the syscalls ran by the program, for instance, you might see a loop with open() returning an error such as ENOENT meaning that a particular file is not there.

Your ps output indicates that the process is not consuming CPU (3rd column), so the problem here is probably not related to a loop but just a waiting operation such as a locked file, waiting for a socket or an external action.

  • kill and coredumps

The kill program, which is used to send a particular signal to a running program is far from being java-related, it very well can be used to send the signal 3 (SIGQUIT) which will close the program and generate a core file. The generation of a core file is permitted only if the correct ulimit permissions are in place, check it with the ulimit -c command. If it says 0, then you should modify it, for instance, to unlimited:

ulimit -c unlimited

Only then should you restart the application and provoke a coredump by sending a kill -3.