Debian – Apache gets stuck while “reading request” and PID takes 100% CPU

apache-2.2debiandebian-squeezexen

Recently, somewhat coinciding with some server upgrades (although there were a variety of things that changed), Apache started ending up with some of its processes stuck in the "reading request" state. Each PID that gets in this state takes up 100% CPU and has very little that's consistent with it and another stuck process (according to lsof) – some have open TCP/IP connections, some have waiting ones, some are only listening on www.

The pattern is as follows:

  1. restart apache
  2. wait for a bit (minutes)
  3. get zombie "reading request" process, CPU starts going up
  4. more zombies come in, all not coinciding with anything obvious
  5. CPU load gets up to 15 – 40 depending on when I last noticed it
  6. GOTO 1

This whole cycle lasts about 30 minutes to 4 hours, depending on my ability to execute step 1 in a timely manner.

server-status gives me:

R_.__.K._K.._._...._........W...................................
................................................................
................................................................
................................................................

Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"C" Closing connection, "L" Logging, "G" Gracefully finishing,
"I" Idle cleanup of worker, "." Open slot with no current process

Srv PID Acc M   CPU     SS  Req Conn    Child   Slot    Client  VHost   Request
0-0 24363   0/1/7   R   0.46    447 844 0.0 0.00    0.26    ?   ?   ..reading.. 
[followed by a bunch of entirely normal requests]

Of course, the key information that would help me debug this is missing from the server-status line there.

I haven't been able to trace it to anything in particular. I've tried lsof, netstat, looking at logs (though there are a ton of logs to look through. Nothing obvious has come up). There are no spikes in network traffic and the server is actively serving a bunch of random websites, so monitoring incoming connections is hard.

Originally, this started happening on an aging Lenny install, so I started piecemeal upgrading packages to Squeeze. So far, no upgrades have caused this to vanish (though happily I'm getting some nice, fresh software!).

Other than starting to debug Apache itself, are there other things that one can do to try to find the source of the issue?


Details:

Debian Lenny/Squeeze (mostly Lenny. Some components are upgraded to Squeeze) running on Linux 2.6.32-5-xen-amd64 on a Debian Squeeze Xen host.

Apache2 MPM prefork (2.2.16-6+squeeze7)

Modules: libapache2-mod-fastcgi, libapache2-mod-perl2, libapache2-mod-php5, libapache2-mod-python, libapache2-mod-scgi, libapache2-mod-wsgi, libapache2-modxslt, libapache2-svn

Best Answer

I get the same problem on my server running CentOS 6.2. I suspect it has something to do with graceful restart as part of the weekly log rotation. When I strace the httpd process that is taking 100% CPU cycle, it is looping around reading empty strings out of a pipe handle (STDIN?) So I guess the root problem is that the read() should block and not returning zero all the time, causing 100% CPU usage.