How to debug a timeout error from apache

apache-2.4centos7timeout

I'm not sure if this question belongs to ServerFault or StackOverflow, but since I'm guessing that I need to debug this problem serverside, I'm going with ServerFault.

The problem

We're running a shared webhosting server for some client of ours. Everything is running smoothly, except for one clients their website. Around 2 to 3 days a week, our monitor detects a brief downtime because apache is not serving the page within 30 seconds, but instead between 60 to 120 seconds. I checked one time with my own desktop to confirm: the website kept loading for 80 seconds and then suddenly loads. There is no increased load, no more requests than normal and the other websites on the server loads perfectly.

We had issues with a specific plugin earlier: this plugin made contact with the server from the author to confirm the license-key. When this server was not reachable, WordPress couldn't continue loading and had the same symptoms as now. We noticed this because one day their server was down for a couple of hours and we had time to disable and enable all the plugins, one by one. According to the plugin author, the problems are solved now.

I have the strong feeling that we're looking at the same problem again, maybe with the same plugin and maybe not. But since the downtime is so brief (usually no more than 2 minutes), I have no idea how to debug this timeout error.

Things I've thought of

Normally I would disable the plugins one by one, but before I'm connected to the database to disable the plugins, the website is up again. Since there is no pattern in the downtime, I can't stay standby for when it happens. Apache logs don't show any errors: I can only see the request from users and see that there are no files served for some time.

My second thought was to run a stacktrace on the apache process. I'm pretty sure this would reveal where Apache is waiting on for so long. But since the server is getting more than 30 requests a minute, the logging file would become very large in a couple of hours, which would make it impossible for us to find the right requests.

Relevant server specifications

CentOS Linux release 7.0.1406 (Core)
Kernel 3.10.0-123.el7.x86_64

Apache/2.4.12 with mod_ruid2
PHP 5.4.38 (cli)
mysql Ver 15.1 Distrib 5.5.41-MariaDB, for Linux (x86_64) using readline 5.1

All compiled by DirectAdmin 1.48.3

Ideas?

Who could think of a good way to debug this very specific problem? Any help is greatly appreciated!

EDIT:

  • Slow query log doesn't report any slow queries at during the slow requests.

Best Answer

If Apache is still reachable, i would grab first the extended status page to see what requests are being served right now. If there is a long running request you could even strace it, pid should be visible in status ( since you have mod_ruid2 i guess you run mod_php and prefork MPM, so a process would serve only a single request at a time ).

Maybe reconfigure Customlog, and log the time taken to serve the request, so later you can identify the slow requests.

Once you have the slow requests, see if can be reproduced . If yes, then its easier to debug, you can even add xdebug for PHP profiling/debugging .

Also see what MySQL queries are running at the time of hang, maybe its a MySQL slow query/locking problem.

Could also be a net API issue as you said.

And when you run out of all options, maybe just talk with the boss, and kick the user. Depending on how many other sites are on the server, the server health may be more important then the site itself.