Setup: I have around +1mio phones that are hitting my server.
The server looks fine. Plenty of both CPU and RAM – the CPU is idle around 90% of the time (1)
The database is not getting much load – less than 100 request a second (2).
When I hit the server through an Apache proxy like “Android Lost” I get a timeout.
When I hit the application server directly on port 8080 I get a reply right away.
What I have done so far is:
- Restart all services, database, apache, jetty
- Rebooted the server
- Tried to install nginx instead of apache (3)
- Tried running Jetty on port 80 and skipping Apache
- Tried to tweak the server settings (4)
To me it sounds like a huge load of requests are trying to hit the server, and somewhere there is a throttle in Apache that needs to be set.
So, any hints or suggestions would be greatly appreciated.
Ad. 1:
top - 20:44:33 up 44 min, 2 users, load average: 2.44, 1.86, 2.80
Tasks: 165 total, 2 running, 163 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.0%us, 0.4%sy, 0.0%ni, 90.6%id, 7.5%wa, 0.0%hi, 0.5%si, 0.0%st
Mem: 12296928k total, 12154152k used, 142776k free, 83228k buffers
Swap: 6287292k total, 0k used, 6287292k free, 10461776k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
447 root 20 0 7587m 841m 14m S 9 7.0 0:39.81 java
1287 mongodb 20 0 120g 272m 247m S 3 2.3 1:38.12 mongod
10 root 20 0 0 0 0 S 0 0.0 0:07.57 rcu_sched
364 root 0 -20 0 0 0 S 0 0.0 0:00.96 kworker/0:1H
381 www-data 20 0 1966m 8188 2164 S 0 0.1 0:00.72 apache2
15562 root 20 0 7706m 105m 11m S 0 0.9 0:13.56 java
32636 www-data 20 0 1966m 8012 2236 S 0 0.1 0:00.72 apache2
Ad. 2:
insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn time
3 17 2 0 0 6 0 58.2g 120g 293m 11 1.7 0 0|0 0|0 3k 9k 43 20:49:40
11 46 8 0 0 24 0 58.2g 120g 295m 6 5.1 0 0|0 0|0 12k 21k 43 20:49:41
12 63 13 0 0 26 0 58.2g 120g 294m 3 1.3 0 0|0 0|0 17k 35k 43 20:49:42
5 45 6 0 0 12 0 58.2g 120g 296m 6 0.9 0 0|1 2|1 13k 22k 43 20:49:43
5 49 5 0 0 11 0 58.2g 120g 298m 5 0.1 0 0|0 0|0 13k 22k
Ad. 3:
From nginx error log:
2014/05/12 19:45:51 [alert] 9800#0: 768 worker_connections are not enough
2014/05/12 19:45:51 [alert] 9800#0: 768 worker_connections are not enough
2014/05/12 19:45:51 [alert] 9800#0: 768 worker_connections are not enough
2014/05/12 19:45:51 [alert] 9800#0: 768 worker_connections are not enough
2014/05/12 19:45:51 [alert] 9800#0: 768 worker_connections are not enough
2014/05/12 19:45:51 [alert] 9800#0: 768 worker_connections are not enough
2014/05/12 19:45:51 [alert] 9800#0: 768 worker_connections are not enough
2014/05/12 19:45:51 [alert] 9800#0: 768 worker_connections are not enough
2014/05/12 19:45:51 [alert] 9800#0: 768 worker_connections are not enough
Ad. 4:
http://www.eclipse.org/jetty/documentation/current/high-load.html#d0e14090
Best Answer
This is due to nginx not having enough worker connections. You can see it in nginx error log:
The max amount of clients nginx can serve is calculated with this formula:
max_clients = worker_processes * worker_connections - keepalive connections
In
nginx.conf
you can setup amount ofworker_processes
andworker_connections
. This is usually in main configuration file somewhere at the top (beforehttp
directive):You will most likely have these set. I recommend to set
worker_processes
to number of cpu cores you have and increase the value ofworker_connection
while checking server's performance until you find the number which your server can/needs to handle.