Apache hanging once or twice a day

apache-2.2

Unfortunately we had to move to using KeepAlive after a series of ddos attacks on our server. I run a web game so pages are generally small and largely text, so having KeepAlive off worked well for many years. Things didn't seem much different (apart from being able to handle the ddos attacks better) after some quick configuration changes and server load stays fairly low at on average about 0.50 with memory usage on about 10% (24GB total).

About once a day however apache just hangs and won't respond for about 15 minutes. Even throughout the day, sometimes pages take 5 seconds to load and I notice in the apache status the only thing on the scoreboard are keepalive (read) requests, no waiting for connection, reading reply, closing connection etc, just about 100+ read requests. Then it sorts itself out. It just seems random when this happens, although it is normally during a busy time of the day neither the server load nor the memory is high nor is apache running out of available request slots.

My configuration changes are as follows:

ServerLimit 1024
MaxClients 1024
Timeout 2
KeepAliveTimeout 5
KeepAlive On
MaxKeepAliveRequests 100
MinSpareServers 100
MaxSpareServers 200

I hope someone has some insight into this or can suggest what logs / processes I can check to see why this is happening.

Thanks in advance.

EDIT: Incase this helps…

56 requests/sec - 116.2 kB/second - 2126 B/request
105 requests currently being processed, 154 idle workers

Later in the day I'd expect double that.

EDIT 2: KeepAliveTimeout changed from 5 to 2

EDIT 3: It just happened again. This time I was around to see it. Apache was unresponsive and wasn't taking up any memory. There were 250 connections from 127.0.0.1 and that never happens. After an Apache restart all was fine. Very odd!

Best Answer

A few things you can do or check:

  1. We had a similar issue where Apache 1.3 didn't properly close connections which left them lingering forever and eventually would fill up all the client slots. To get around this we just restart Apache once a day in a cron script. Your issue sounds different as you mention it clears up eventually so daily restarting may not help anything.
  2. I assume you are never reaching the "MaxClients" in concurrent connections?
  3. What is your "KeepAliveTimeout" setting? The default is 15 secs which is large for most sites with 1 or 2 seconds typically working better (especially since your "Timeout" is set to 2 anyways).
  4. Check "netstat -an" for obvious signs of DOS from one client. I've found that often when there are non-obvious site issues it is due to one client hammering the site continually with 20 requests/sec. Check to make sure you are not capping the server's bandwidth (which is easy if you only have a 10 Mbps connection). I use my provider's on-line bandwidth monitoring graphs but there should be command line ways to do this as well (ifconfig might let you). There are various Apache or server level solutions to automatically preventing DOS attacks (whether innocent or not).
  5. Check "top" during the issue and look at the CPU, IO, and memory usage. Make sure you are not swapping memory (which you shouldn't if you are only at 10% memory usage).
  6. Speaking of memory....10% memory usage of even a 24GB RAM server seems low. I have several servers with 1-4GB of RAM and they are all in the 75-90% memory usage (much of that is in the OS cache however). I suppose this would depend on your Apache setup and server usage.
  7. Make sure there are not other dead-locks in your application layer that are causing the issue (like a database). For example, check your Apache "server-status" page and if that loads quickly but your regular site's page loads slowly then the issue is likely not the Apache server.
  8. Check logs in "/var/log", particularly the Apache error logs and the "message" log for relevant messages. If you don't have these, or other application logs enabled, enable them at least temporarily.
  9. Check the system limits of things like the number of open files allowed at a time (ulimit -n). The default settings of many servers/OSes are not necessarily configured for a high-volume server.
  10. If all else fails challenge your assumptions about what the issue could be and double-check items that "can't" fail or that you've already checked.