Lighttpd: backend is overloaded

fastcgilighttpd

I have a high traffic site I'm trying to maintain, but from time to time at spikes get stuck with:

 (mod_fastcgi.c.2900) backend is overloaded; we'll disable it for 2 seconds and send the request to another backend instead: reconnects: 0 load: 2541

Current stats are:

absolute (since start)
Requests    15 kreq
Traffic 20.02 Mbyte
average (since start)
Requests    81 req/s
Traffic 106.24 kbyte/s
average (5s sliding average)
Requests    94 req/s
Traffic 99.23 kbyte/s

3952 connections

The site itself is a very simply PHP site, no MySQL involved. I do have APC installed and configured.

I have added suggested changes to /etc/sysctl.conf:

# These ensure that TIME_WAIT ports either get reused or closed fast.
net.ipv4.tcp_fin_timeout = 1
net.ipv4.tcp_tw_recycle = 1

# TCP memory
net.core.rmem_max = 16777216
net.core.rmem_default = 16777216
net.core.netdev_max_backlog = 262144
net.core.somaxconn = 262144

net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2
net.ipv4.tcp_max_syn_backlog = 2048
net.ipv4.tcp_max_syn_backlog = 262144
net.ipv4.tcp_max_orphans = 262144

My lighttpd.conf looks like:

server.max-fds = 12000
server.max-keep-alive-requests = 0
server.event-handler = "linux-sysepoll"
server.max-connections = 10000

The fastcgi config:

## Start an FastCGI server for php (needs the php5-cgi package)
fastcgi.server    = ( ".php" =>
    ((
            "bin-path" => "/usr/bin/php-cgi",
            "socket" => "/tmp/php.socket",
            "max-procs" => 14,
            "bin-environment" => (
                    "PHP_FCGI_CHILDREN" => "30",
                    "PHP_FCGI_MAX_REQUESTS" => "100000"
            ),
            "bin-copy-environment" => (
                    "PATH", "SHELL", "USER"
            ),
            "broken-scriptfilename" => "enable"
    ))
)

server utilization:

top - 08:04:26 up 97 days, 15:14,  1 user,  load average: 0.10, 0.08, 0.04
Tasks: 570 total,   3 running, 567 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.9%us,  0.2%sy,  0.0%ni, 98.5%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   4056176k total,  3716120k used,   340056k free,   631600k buffers
Swap:   995988k total,    15544k used,   980444k free,  1631236k cached

I have tried a variety of things. I know that having a high mac procs is generally not good, however if I put it any lower the server immediately starts throwing 500 errors.

Does anyone have any suggestions on what else to try to adjust to keep the site stable? Is it even plausible to support this level of traffic on a single server?

Best Answer

"backend is overloaded" - this means that one of the 14 max-procs backends is overloaded (lighttpd creates different sockets for each max-procs backend by append "-[number]" to the socket filenames).

I'd go for a lower max-procs number, and instead increase PHP_FCGI_CHILDREN, for example "max-procs" => 2 and "PHP_FCGI_CHILDREN" => "210", or 4 and 100 (or 1 and 400).

This should decrease the chance that one of the backends is "full" while another can still accept requests. I'm not sure though how good APC is scaling with the number of PHP_FCGI_CHILDREN.

Another way would be using spawn-fcgi combined with multiwatch (multiwatch does the 'max-procs' part) - in this solution, all php backends are on the same socket, so you don't have balancing problems.