I am looking for suggestions on where shall I dig deeper.
Shortly, Apache2 server stops serving requests after 2-15 hours of being started. As a result I have to do service apache2 restart
each half of a day.
Long version:
- I am running few websites (Apache 2.2.22 Built Jul 12 2013) from a dedicated server (Ubuntu 13.04).
- Apache2 server was behaving fine for more than half a year, now suddenly it stops proceeding requests on all websites (having around 5 sites) until apache process is restarted.
- I could not find any abnormal logs in /var/log/apache regards the issue.
service apache2 status
reports that process is running
Will be glad to hear your suggestions, and what shall I do in my situation.
UPDATE:
Running netstat -an | grep 80
:
tcp6 0 0 :::80 :::* LISTEN
tcp6 325 0 SERV_IP:80 IP_A:35514 CLOSE_WAIT
tcp6 332 0 SERV_IP:80 IP_B:34198 CLOSE_WAIT
tcp6 379 0 SERV_IP:80 IP_C:57859 CLOSE_WAIT
tcp6 0 0 SERV_IP:80 IP_A:35060 CLOSE_WAIT
tcp6 360 0 SERV_IP:80 IP_A:38481 CLOSE_WAIT
tcp6 466 0 SERV_IP:80 IP_B:56324 CLOSE_WAIT
tcp6 361 0 SERV_IP:80 IP_A:53466 CLOSE_WAIT
tcp6 1 0 SERV_IP:80 IP_A:38102 CLOSE_WAIT
tcp6 196 0 SERV_IP:80 IP_E:58125 ESTABLISHED
and more entries like these, around 150 of them.
ps aux | grep apache
:
root 2968 0.0 0.0 452240 21116 ? Ss 16:08 0:01 /usr/sbin/apache2 -k start
www-data 5217 0.0 0.0 463584 23820 ? S 17:04 0:03 /usr/sbin/apache2 -k start
There are around 120 of the later lines (www-data), so I assume 120 apache processes?
Using strafe
on apache2 root process:
sudo strace -f -p 2968
Process 2968 attached - interrupt to quit
select(0, NULL, NULL, NULL, {0, 264394}) = 0 (Timeout)
wait4(-1, 0x7fff6d157a6c, WNOHANG|WSTOPPED, NULL) = 0
select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
wait4(-1, 0x7fff6d157a6c, WNOHANG|WSTOPPED, NULL) = 0
Using on one of the www-data processes:
sudo strace -f -p 8554
Process 8554 attached - interrupt to quit
flock(40, LOCK_EX
Whoa, it looks to me as if somehow apache processes get stuck, and once maximum limit of connection exceeds, it stops creating new instances. But why do they get stuck?
htop, iotop, jnettop do not report any anomaly. (no overloading)
UPDATE2:
Server is no longer crashing over last two days. So I am unable to get more info.. Instead, I am thankful for your help and accept the answer. Once more information is available, I will leave a link to a new question with a better constructed body. Thanks
Best Answer
No matter what "service apache2 status" reports, do you see apache processes when you do ps aux?
Can you do a netstat -n when the problem occurs? Maybe you run out of a resource eg file descriptors, you may have too many open connections.
During the problem do you have high cpu utilization? Maybe the system runs out of memory and is thrashing?
The http server responds with connection refused or the connection just timeouts?
In the latter case, I would suggest doing strace -f -p [apachepid] and you may find out which system call is blocking the request. In the former, probably apache has crashed.
Do you proxy Tomcat or another application server or do you serve plain static html?
Have you configured authentication? eg maybe something goes wrong in authentication layer
UPDATE:
In the second strace I see this flock(40,LOCK_EX Maybe the processes try to get an exclusive lock somewhere? can you do lsof -n -p 8554 (or whatever pid tries to flock) and see which file it tries to lock (40 is the file descriptor). you could also "ls /proc/8554/fd"