Apache server completely freezes until it gets restarted

apache-2.2

My server does this every few days. What sucks is that it always seems to do this right after I go to bed, so when I wake up, I'm greeted with the fact that my server has been down for the past 6 or 7 hours.

When I first noticed this, I added a cronjob that tries to restart the server every 15 minutes, but I guess that didn't fix it. Once I noticed the server was down, I can this command:

/etc/init.d/apache2 restart
* Restarting web server apache2
apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1 for ServerName
... waiting ...........................................................apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1 for ServerName
httpd (pid 17597) already running

…which is odd, because a restart should restart the server, even if it's already running, correct? I eventually had to "stop" then "start" to get it working again.

I then looked through the logs, and found something very weird. It seems that around the time the server crashed, the logs have entries that are wildly out of order. It looks a little like this:

xx.xxx.xxx.x - - [21/Apr/2010:06:32:05 -0400] "GET / blah"
xx.xxx.xxx.x - - [21/Apr/2010:06:51:25 -0400] "GET / blah"
x.xx.xxx.xxx - - [21/Apr/2010:06:38:23 -0400] "GET / blah"
xxx.xx.xx.xx - - [21/Apr/2010:06:31:56 -0400] "GET / blah"
xxx.xx.xx.xx - - [21/Apr/2010:06:51:49 -0400] "GET / blah"
xx.xx.xxx.xx - - [21/Apr/2010:06:33:20 -0400] "GET / blah"

I don't think the problem is memory, because this:

tells me that right before the crash, memory usage is fine.

I'm running apache with the worker mpm, here are the settings for that:

<IfModule mpm_worker_module>
  StartServers            1
  MaxClients            100
  MinSpareThreads         5
  MaxSpareThreads        10
  ThreadsPerChild        10
  MaxRequestsPerChild  3000
</IfModule>

This apache server is running a bunch of stuff, but most of the traffic comes from a django project I'm hosting, that uses mod_wsgi. There also is a simple machines forum that is running off of mod_fcgid. Those setting are below:

<IfModule mod_fcgid.c>
  MaxRequestsPerProcess 500
  MaxProcessCount 3

  AddHandler fcgid-script .php .fcgi
  AddHandler cgi-script .cgi .pl
  FCGIWrapper "/usr/bin/php-cgi" .php 
</IfModule>

Anyone know of anything else I can check? I've just about tweaked every single setting I can think of, yet these freezes still happen.

Edit: I have both a postgres and mysql server running on this machine, but they both work during this freeze, because my backup script ran during that 5 hour time frame, and it worked perfectly fine.

Edit2: I'm running Ubuntu Server 9.10. When the server is down, all requests just never return. The page hangs. No error messages or anything.

Best Answer

You don't say anything how you are using mod_wsgi and have it configured. I would suggest as a start to read 'http://code.google.com/p/modwsgi/wiki/ApplicationIssues#Python_Simplified_GIL_State_API'. You possibly are using a C extension module for Python which doesn't implement full threading properly. If you use daemon mode of mod_wsgi though, such deadlocks should be detected and processes at least forcibly restarted after a period. So, if you are using embedded mode, which is discouraged, then use daemon mode instead as a start.

Overall, this sort of issue, if you believe it is related to mod_wsgi should be discussed on the mod_wsgi mailing list. Debugging stuff like this on StackOverflow/ServerFault/SuperUser is really hard.