Apache not responding to all requests


Setup: I have around +1mio phones that are hitting my server.

The server looks fine. Plenty of both CPU and RAM – the CPU is idle around 90% of the time (1)

The database is not getting much load – less than 100 request a second (2).

When I hit the server through an Apache proxy like “Android Lost” I get a timeout.

When I hit the application server directly on port 8080 I get a reply right away.

What I have done so far is:

  1. Restart all services, database, apache, jetty
  2. Rebooted the server
  3. Tried to install nginx instead of apache (3)
  4. Tried running Jetty on port 80 and skipping Apache
  5. Tried to tweak the server settings (4)

To me it sounds like a huge load of requests are trying to hit the server, and somewhere there is a throttle in Apache that needs to be set.

So, any hints or suggestions would be greatly appreciated.

Ad. 1:

top - 20:44:33 up 44 min,  2 users,  load average: 2.44, 1.86, 2.80
Tasks: 165 total,   2 running, 163 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.0%us,  0.4%sy,  0.0%ni, 90.6%id,  7.5%wa,  0.0%hi,  0.5%si,  0.0%st
Mem:  12296928k total, 12154152k used,   142776k free,    83228k buffers
Swap:  6287292k total,        0k used,  6287292k free, 10461776k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                               
  447 root      20   0 7587m 841m  14m S    9  7.0   0:39.81 java                                                                   
 1287 mongodb   20   0  120g 272m 247m S    3  2.3   1:38.12 mongod                                                                 
   10 root      20   0     0    0    0 S    0  0.0   0:07.57 rcu_sched                                                              
  364 root       0 -20     0    0    0 S    0  0.0   0:00.96 kworker/0:1H                                                           
  381 www-data  20   0 1966m 8188 2164 S    0  0.1   0:00.72 apache2                                                                
15562 root      20   0 7706m 105m  11m S    0  0.9   0:13.56 java                                                                   
32636 www-data  20   0 1966m 8012 2236 S    0  0.1   0:00.72 apache2   

Ad. 2:

insert  query update delete getmore command flushes mapped  vsize    res faults locked % idx miss %     qr|qw   ar|aw  netIn netOut  conn       time 
     3     17      2      0       0       6       0  58.2g   120g   293m     11      1.7          0       0|0     0|0     3k     9k    43   20:49:40 
    11     46      8      0       0      24       0  58.2g   120g   295m      6      5.1          0       0|0     0|0    12k    21k    43   20:49:41 
    12     63     13      0       0      26       0  58.2g   120g   294m      3      1.3          0       0|0     0|0    17k    35k    43   20:49:42 
     5     45      6      0       0      12       0  58.2g   120g   296m      6      0.9          0       0|1     2|1    13k    22k    43   20:49:43 
     5     49      5      0       0      11       0  58.2g   120g   298m      5      0.1          0       0|0     0|0    13k    22k

Ad. 3:

From nginx error log:

2014/05/12 19:45:51 [alert] 9800#0: 768 worker_connections are not enough
2014/05/12 19:45:51 [alert] 9800#0: 768 worker_connections are not enough
2014/05/12 19:45:51 [alert] 9800#0: 768 worker_connections are not enough
2014/05/12 19:45:51 [alert] 9800#0: 768 worker_connections are not enough
2014/05/12 19:45:51 [alert] 9800#0: 768 worker_connections are not enough
2014/05/12 19:45:51 [alert] 9800#0: 768 worker_connections are not enough
2014/05/12 19:45:51 [alert] 9800#0: 768 worker_connections are not enough
2014/05/12 19:45:51 [alert] 9800#0: 768 worker_connections are not enough
2014/05/12 19:45:51 [alert] 9800#0: 768 worker_connections are not enough

Ad. 4:


Best Answer

This is due to nginx not having enough worker connections. You can see it in nginx error log:

2014/05/12 19:45:51 [alert] 9800#0: 768 worker_connections are not enough 
2014/05/12 19:45:51 [alert] 9800#0: 768 worker_connections are not enough

The max amount of clients nginx can serve is calculated with this formula:

max_clients = worker_processes * worker_connections - keepalive connections

In nginx.conf you can setup amount of worker_processes and worker_connections. This is usually in main configuration file somewhere at the top (before http directive):

worker_processes 1;
events {
    worker_connections 128

You will most likely have these set. I recommend to set worker_processes to number of cpu cores you have and increase the value of worker_connection while checking server's performance until you find the number which your server can/needs to handle.