Apache Requests – Fix Stuck in ‘..reading..’ State

apache-2.2

We've recently been hitting our maxclients (300) on our server and are noticing a lot of connections are stuck in the R reading state. Here's an excerpt from our server-status dump:

Current Time: Monday, 29-Apr-2013 11:46:00 PDT
Restart Time: Monday, 29-Apr-2013 11:03:48 PDT
Parent Server Generation: 0
Server uptime: 42 minutes 12 seconds
Total accesses: 144170 - Total Traffic: 29.8 MB
CPU Usage: u188.25 s345.65 cu2601.11 cs0 - 124% CPU load
56.9 requests/sec - 12.0 kB/second - 216 B/request
244 requests currently being processed, 56 idle workers
KRR_RRRRRRRRRRRR_RRRKRRRR.......................................
K_RRK_RK_RKR_RRR_RRRR_R_R.......................................
RRRRRRRRRRRRRC_CR___RCRR_.......................................
R_RRRCRRRRRRRRCRRRCRRKRCR.......................................
R_K_RKR_CRRRRRRKC_RR_KR_R.......................................
RRRRKRR__RRRRKRRCR_RRRRRR.......................................
_RCRRRRRRRRRKR__RR_RRCRRR.......................................
RRRRRRRRCRRRRRRKRRRR_RCRR.......................................
R___RRCR__RR__RRRKC___RK_.......................................
_R_KCR__RR__R_R_RRRRRR_RK.......................................
RRR_RR___RCRRK_R_RWRR_RRC.......................................
RRKRCRRRRRRRRKRRRRRRRRCRR.......................................
................................................................
................................................................
................................................................
................................................................


Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"C" Closing connection, "L" Logging, "G" Gracefully finishing,
"I" Idle cleanup of worker, "." Open slot with no current process

Srv PID Acc M   CPU SS  Req Conn    Child   Slot    Client  VHost   Request
0-0 10320   1/718/718   K   399.14  0   42  0.0 0.21    0.21    69.171.178.24   mysite.com  GET /home.php
0-0 10320   0/434/434   R   398.21  8   40  0.0 0.07    0.07    ?   ?   ..reading..
0-0 10320   0/548/548   R   331.52  558 45  0.0 0.12    0.12    ?   ?   ..reading..
0-0 10320   0/460/460   _   399.05  0   42  0.0 0.08    0.08    98.129.101.123  mysite.com  GET /home.php
0-0 10320   0/473/473   R   364.89  301 48  0.0 0.09    0.09    ?   ?   ..reading..
0-0 10320   0/616/616   R   396.18  26  46  0.0 0.11    0.11    ?   ?   ..reading..
0-0 10320   0/494/494   R   378.84  127 48  0.0 0.10    0.10    ?   ?   ..reading..
0-0 10320   0/231/231   R   311.62  687 43  0.0 0.04    0.04    ?   ?   ..reading..
0-0 10320   0/376/376   R   391.17  76  53  0.0 0.08    0.08    ?   ?   ..reading..
0-0 10320   0/377/377   R   380.72  176 53  0.0 0.07    0.07    ?   ?   ..reading..
0-0 10320   0/589/589   R   355.07  382 47  0.0 0.10    0.10    ?   ?   ..reading..
0-0 10320   0/587/587   R   393.16  56  50  0.0 0.13    0.13    ?   ?   ..reading..
0-0 10320   0/561/561   R   393.89  49  40  0.0 0.10    0.10    ?   ?   ..reading..
0-0 10320   0/616/616   R   389.88  81  41  0.0 0.22    0.22    ?   ?   ..reading..
0-0 10320   0/354/354   R   392.12  59  42  0.0 0.06    0.06    ?   ?   ..reading..
0-0 10320   0/346/346   R   377.51  192 44  0.0 0.06    0.06    ?   ?   ..reading..

Why are there so many in that R state? And does the SS column really mean that they have been reading for 558, 301, etc seconds?

Here's some of my server info and settings:

Server version: Apache/2.2.24 (Unix)
Server built:   Mar 22 2013 10:57:57
Cpanel::Easy::Apache v3.18.9 rev9999
Server's Module Magic Number: 20051115:31 Server loaded:  APR 1.4.6, APR-Util 1.4.1 Compiled using: APR 1.4.6, APR-Util 1.4.1
Architecture:   64-bit
Server MPM:     Worker
  threaded:     yes (fixed thread count)
    forked:     yes (variable process count)
Server compiled with....
 -D APACHE_MPM_DIR="server/mpm/worker"
 -D APR_HAS_SENDFILE
 -D APR_HAS_MMAP
 -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)  -D APR_USE_SYSVSEM_SERIALIZE  -D APR_USE_PTHREAD_SERIALIZE  -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT  -D APR_HAS_OTHER_CHILD  -D AP_HAVE_RELIABLE_PIPED_LOGS  -D DYNAMIC_MODULE_LIMIT=128  -D HTTPD_ROOT="/usr/local/apache"
 -D SUEXEC_BIN="/usr/local/apache/bin/suexec"
 -D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
 -D DEFAULT_ERRORLOG="logs/error_log"
 -D AP_TYPES_CONFIG_FILE="conf/mime.types"
 -D SERVER_CONFIG_FILE="conf/httpd.conf"

StartServers 5
<IfModule prefork.c>
MinSpareServers 5
MaxSpareServers 10
</IfModule>
ServerLimit 300
MaxClients 300
MaxRequestsPerChild 10000
KeepAlive On
KeepAliveTimeout 1
MaxKeepAliveRequests 100

Here are compiled in modules:
  core.c
  mod_authn_file.c
  mod_authn_dbm.c
  mod_authn_dbd.c
  mod_authn_default.c
  mod_authn_alias.c
  mod_authz_host.c
  mod_authz_groupfile.c
  mod_authz_user.c
  mod_authz_dbm.c
  mod_authz_default.c
  mod_auth_basic.c
  mod_auth_digest.c
  mod_file_cache.c
  mod_cache.c
  mod_disk_cache.c
  mod_include.c
  mod_filter.c
  mod_deflate.c
  mod_log_config.c
  mod_log_forensic.c
  mod_logio.c
  mod_env.c
  mod_mime_magic.c
  mod_expires.c
  mod_headers.c
  mod_unique_id.c
  mod_setenvif.c
  mod_version.c
  mod_proxy.c
  mod_proxy_connect.c
  mod_proxy_ftp.c
  mod_proxy_http.c
  mod_proxy_scgi.c
  mod_proxy_ajp.c
  mod_proxy_balancer.c
  mod_ssl.c
  worker.c
  http_core.c
  mod_mime.c
  mod_dav.c
  mod_status.c
  mod_autoindex.c
  mod_asis.c
  mod_info.c
  mod_suexec.c
  mod_cgid.c
  mod_dav_fs.c
  mod_dav_lock.c
  mod_negotiation.c
  mod_dir.c
  mod_actions.c
  mod_speling.c
  mod_userdir.c
  mod_alias.c
  mod_rewrite.c
  mod_so.c

When we restart the reading requests slowly start building back up every time. Thanks for your help in advance!

UPDATE: here's netstat -pn:

tcp        0      0 198.57.162.52:80            54.235.31.205:41209         TIME_WAIT   -                   
tcp        0      0 198.57.162.52:80            166.147.121.170:63246       TIME_WAIT   -                   
tcp        0      0 198.57.162.52:80            50.128.171.238:50751        FIN_WAIT2   -                   
tcp        0      0 198.57.162.52:80            89.241.135.84:24825         TIME_WAIT   -                   
tcp        0      0 198.57.162.52:80            205.142.167.225:33256       TIME_WAIT   -                   
tcp        0      0 198.57.162.52:80            76.127.87.244:56493         TIME_WAIT   -                   
tcp        0      0 198.57.162.52:80            205.153.156.222:21571       TIME_WAIT   -                   
tcp        0      0 198.57.162.52:80            108.12.147.14:61135         TIME_WAIT   -                   
tcp        0      0 198.57.162.52:80            166.217.157.32:1630         TIME_WAIT   -                   
tcp        0      0 198.57.162.52:443           71.22.102.71:4252           TIME_WAIT   -               

Sample lsof output:

httpd      4092   nobody   25u     IPv4          711278095         0t0        TCP 198-57-162-52.unifiedlayer.com:http->4.sub-70-193-66.myvzw.com:12471 (ESTABLISHED)
httpd      4092   nobody   26u     IPv4          711350284         0t0        TCP 198-57-162-52.unifiedlayer.com:http->c75-111-15-253.amrlcmta01.tx.dh.suddenlink.net:51298 (ESTABLISHED)
httpd      4092   nobody   27u     IPv4          711400163         0t0        TCP 198-57-162-52.unifiedlayer.com:http->108-235-145-129.lightspeed.irvnca.sbcglobal.net:56037 (FIN_WAIT2)
httpd      4092   nobody   28u     IPv4          711390092         0t0        TCP 198-57-162-52.unifiedlayer.com:http->c-98-239-167-37.hsd1.pa.comcast.net:57988 (ESTABLISHED)
httpd      4092   nobody   29u     IPv4          711206563         0t0        TCP 198-57-162-52.unifiedlayer.com:http->252.sub-70-199-129.myvzw.com:6179 (ESTABLISHED)
httpd      4092   nobody   30u     IPv4          711271036         0t0        TCP 198-57-162-52.unifiedlayer.com:https->5.71.20.78:62114 (ESTABLISHED)
httpd      4092   nobody   31u     IPv4          711385050         0t0        TCP 198-57-162-52.unifiedlayer.com:http->64.128.135.115:17547 (ESTABLISHED)
httpd      4092   nobody   32u     IPv4          711400374         0t0        TCP 198-57-162-52.unifiedlayer.com:https->mbb5336d0.tmodns.net:36277 (ESTABLISHED)
httpd      4092   nobody   33u     IPv4          711378965         0t0        TCP 198-57-162-52.unifiedlayer.com:http->us-chi-exchange.brightstarcorp.com:47707 (ESTABLISHED)
httpd      4092   nobody   34u     IPv4          711394997         0t0        TCP 198-57-162-52.unifiedlayer.com:https->209-152-137-170.dsl.mcmo.net:64370 (ESTABLISHED)
httpd      4092   nobody   35u     IPv4          711391122         0t0        TCP 198-57-162-52.unifiedlayer.com:http->c-76-125-136-230.hsd1.oh.comcast.net:57093 (ESTABLISHED)
httpd      4092   nobody   36u     IPv4          711398994         0t0        TCP 198-57-162-52.unifiedlayer.com:http->lookup.eatright.org:60021 (ESTABLISHED)
httpd      4092   nobody   37u     IPv4          711400972         0t0        TCP 198-57-162-52.unifiedlayer.com:http->modemcable126.224-130-66.mc.videotron.ca:55970 (ESTABLISHED)
httpd      4092   nobody   38u     IPv4          711379727         0t0        TCP 198-57-162-52.unifiedlayer.com:http->c-75-65-146-83.hsd1.ms.comcast.net:50309 (ESTABLISHED)
httpd      4092   nobody   39u     IPv4          711390427         0t0        TCP 198-57-162-52.unifiedlayer.com:https->66-87-130-177.pools.spcsdns.net:41127 (ESTABLISHED)
httpd      4092   nobody   40u     IPv4          711401179         0t0        TCP 198-57-162-52.unifiedlayer.com:http->8.225.196.132:20608 (ESTABLISHED)
httpd      4092   nobody   41u     IPv4          711142701         0t0        TCP 198-57-162-52.unifiedlayer.com:http->c-24-15-33-77.hsd1.il.comcast.net:54631 (ESTABLISHED)

Best Answer

@ShaneN Your server is reaching choke of resources point-of-failure. Maybe you're getting without ephemeral port range, or without system file-descriptors, which limits the number of sockets your server can use to answer requests. A reading state request means that the socket was opened, but the request line never reached the server, it's a known form of DoS called SlowLoris. To mitigate this kind of DoS, I'd first:

Add to /etc/sysctl.conf, the following entries:

net.ipv4.tcp_fin_timeout = 10 which sets the timeout of FIN sockets to 10seconds.

net.ipv4.ip_local_port_range = "15000 61000" which sets the ephemeral port range to about 46000 possible open ports per IP.

net.core.somaxconn = 1024 - default 128 - The net.core.somaxconn value has an important role. It limits the maximum number of requests queued to a listen socket.

net.core.netdev_max_backlog = 2000 - default 1000 - which increases the TCP queue of incoming requests.

After adding those lines to the file, issue sysctl -p to load the kernel parameters at the current runtime and check your rush hour to see if your problem's still there.