Ubuntu – Apache2 server stops responding

apache-2.2Ubuntu

I am looking for suggestions on where shall I dig deeper.

Shortly, Apache2 server stops serving requests after 2-15 hours of being started. As a result I have to do service apache2 restart each half of a day.

Long version:

  1. I am running few websites (Apache 2.2.22 Built Jul 12 2013) from a dedicated server (Ubuntu 13.04).
  2. Apache2 server was behaving fine for more than half a year, now suddenly it stops proceeding requests on all websites (having around 5 sites) until apache process is restarted.
  3. I could not find any abnormal logs in /var/log/apache regards the issue.
  4. service apache2 status reports that process is running

Will be glad to hear your suggestions, and what shall I do in my situation.

UPDATE:

Running netstat -an | grep 80:

tcp6       0      0 :::80                   :::*                    LISTEN
tcp6     325      0 SERV_IP:80       IP_A:35514     CLOSE_WAIT
tcp6     332      0 SERV_IP:80       IP_B:34198     CLOSE_WAIT
tcp6     379      0 SERV_IP:80       IP_C:57859     CLOSE_WAIT
tcp6       0      0 SERV_IP:80       IP_A:35060     CLOSE_WAIT
tcp6     360      0 SERV_IP:80       IP_A:38481     CLOSE_WAIT
tcp6     466      0 SERV_IP:80       IP_B:56324     CLOSE_WAIT
tcp6     361      0 SERV_IP:80       IP_A:53466     CLOSE_WAIT
tcp6       1      0 SERV_IP:80       IP_A:38102     CLOSE_WAIT
tcp6     196      0 SERV_IP:80       IP_E:58125     ESTABLISHED

and more entries like these, around 150 of them.

ps aux | grep apache:

root      2968  0.0  0.0 452240 21116 ?        Ss   16:08   0:01 /usr/sbin/apache2 -k start
www-data  5217  0.0  0.0 463584 23820 ?        S    17:04   0:03 /usr/sbin/apache2 -k start

There are around 120 of the later lines (www-data), so I assume 120 apache processes?

Using strafe on apache2 root process:

 sudo strace -f -p 2968
Process 2968 attached - interrupt to quit
select(0, NULL, NULL, NULL, {0, 264394}) = 0 (Timeout)
wait4(-1, 0x7fff6d157a6c, WNOHANG|WSTOPPED, NULL) = 0
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
wait4(-1, 0x7fff6d157a6c, WNOHANG|WSTOPPED, NULL) = 0

Using on one of the www-data processes:

sudo strace -f -p 8554
Process 8554 attached - interrupt to quit
flock(40, LOCK_EX

Whoa, it looks to me as if somehow apache processes get stuck, and once maximum limit of connection exceeds, it stops creating new instances. But why do they get stuck?

htop, iotop, jnettop do not report any anomaly. (no overloading)

UPDATE2:
Server is no longer crashing over last two days. So I am unable to get more info.. Instead, I am thankful for your help and accept the answer. Once more information is available, I will leave a link to a new question with a better constructed body. Thanks

Best Answer

No matter what "service apache2 status" reports, do you see apache processes when you do ps aux?

Can you do a netstat -n when the problem occurs? Maybe you run out of a resource eg file descriptors, you may have too many open connections.

During the problem do you have high cpu utilization? Maybe the system runs out of memory and is thrashing?

The http server responds with connection refused or the connection just timeouts?

In the latter case, I would suggest doing strace -f -p [apachepid] and you may find out which system call is blocking the request. In the former, probably apache has crashed.

Do you proxy Tomcat or another application server or do you serve plain static html?

Have you configured authentication? eg maybe something goes wrong in authentication layer

UPDATE:

In the second strace I see this flock(40,LOCK_EX Maybe the processes try to get an exclusive lock somewhere? can you do lsof -n -p 8554 (or whatever pid tries to flock) and see which file it tries to lock (40 is the file descriptor). you could also "ls /proc/8554/fd"