Nginx – Reducing gunicorn CPU usage on tiny requests

gunicornhttpnginxpython

I'm writing an event aggregation server in Python, using Nginx + Gunicorn. The system scales to about 300 rps before the CPU maxes out on a 1 cpu/2 core box (AWS c4.large). Adding an additional gunicorn worker, or using eventlet workers, only helps on the margins (about 10%). Responses are in the 1-2ms range (the events are being written to disk).

From my analysis, it appears that gunicorn is spending all of its time and energy reading from the socket (it's in sync.py in the select() call). Nginx, meanwhile, is taking up about 2-3% of the CPU itself. Using UNIX sockets did not change the performance profile over network sockets.

Since these events are so small (~ 200 bytes), it appears that it takes an inordinate amount of effort for gunicorn to get the request off of the socket. I would love to be able to batch these request payloads somehow before they reach gunicorn, but I have no idea how to do that. Is there any way that I can reduce the amount of effort/CPU that gunicorn is taking and increase my throughput per box?

Best Answer