Linux – How to debug unresponsive web server

amazon-web-servicesapache-2.2linuxPHPUbuntu

We have an Medium EC2 instance running Ubuntu 12.04, serving about a dozen small PHP web applications via Apache.

Approximately every other day, the server becomes unresponsive and rebooting the instance is required to restore functionality. During this time, the server cannot be accessed via HTTP or SSH.

Every time, the last logged Apache request is to a PHP application that serves a 4MB PDF document. The User Agent always identifies the client as an iPad (specifically Mozilla/5.0 (iPad; CPU OS 6_1_3 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10B329 Safari/8536.25) and is often the same IP address, and therefore likely the same user.

The PHP application is a legacy application, and checks some permissions before echo-ing the contents of a file from disk to the client. We have not been able to reproduce this issue ourselves, either using an iPad, nor by accessing the file by any other means.

We've tried a few monitoring solutions to try and get a better picture of what's happening when the server goes down, but none of them appear show any issue with system resources.

My question is what are some strategies we can use to try and troubleshoot and hopefully resolve this issue?

Best Answer

Start by monitor system resources (cpu load, memory, disk), for example with collectd or sysstat.

Keep in mind that I'm going out on a limb here, the problem you are describing might result from an exhaustion of a resource (most likely memory), run egrep -i 'killed process' /var/log/* to look for OOM killer invocations.

System logs might contain traces of the cause (/var/log/messages, apache's error logs).

Try enabling more detailed logs and pay close attention to your system while testing it.