Apache is not responding and nothing is logged after short, strong “traffic wave”

apache-2.2not-responding

My apache is serving about 300 request/sec (2 megabytes/s) constantly with server load of 0.05.

The problem is, that my service architecture causes to get huge traffic at specific moment (like 300-500 people is redirected to some page with JavaScript in several seconds).

After such short traffic jump, apache becomes unresponsive (connection reset after about 30 seconds in firefox) without logging anything. Apache is freezed until apache2 restart procedure.

When freezed, it cannot serve even simple HTML file without PHP or SQL connection (but apache2 processes exists)

I tried different prefork settings from 50 to almost 1000 idle workers and max clients limits of 10000, but nothing helps.

Another symptome apart from not logging anything, is that moments before freeze, apache status module shows (that last time before it gets unresponsive also) that almost every process wait for connection:

__R_R_______R__RR______R___R________________RR_______R______R___
_________R__________R_________________________R________CR___R___
___________R__________________________C__WR__R________________R_

But in normal, less-laoded work it shows:

C___R___K_C___C___C_____KK______R___C_C_R______C__K___C________K
____C__KR_RR__C___K___KK_C__R__K__C_CK__RC___CR___R__K__C__R____
___KR____C_____R______R______K__R_______KC__C_K__R____C_______R_

syslog also gives nothing. My machine has 64GB RAM and never exceeds load of 0.1

Best Answer

I think that when your connections spike at more than 450 per second it may relate to the fact that you're running out of ephemeral ports in Linux.

Check out this previously answered question

Small abstract from the answer:


sysctl net.ipv4.ip_local_port_range
sysctl net.ipv4.tcp_fin_timeout

The ephermal port range defines the maximum number of outbound sockets a host can create from a particular I.P. address. The fin_timeout defines the minimum time these sockets will stay in TIME_WAIT state (unusable after being used once). Usual system defaults are:

net.ipv4.ip_local_port_range = 32768 61000
net.ipv4.tcp_fin_timeout = 60 

This basically means your system cannot guarantee more than (61000 - 32768) / 60 = 470 sockets at any given time. If you are not happy with that, you could begin with increasing the port_range. Setting the range to 15000 61000 is pretty common these days. You could further increase the availability by decreasing the fin_timeout. Suppose you do both, you should see over 1500 outbound connections, more readily.

Related Topic