Nginx cache reverse proxy: how to keep app server alive during the 5 second window when cache expires


I have an nginx server sitting in front of apache running django.

Most of my site is static content:

My app server can handle the pieces that needs to be dynamic just fine (POST, cart, order status, faq, etc.).

A far far majority of hits are to static pages like product pages, about pages, ajax get requests.

It handles like a champ serving pages straight out of memcached. I actually asked SF for help determining my bottleneck here: Does this prove a network bandwidth bottleneck? and it's limited by my hosts outbound traffic limiting. thus it's 100% capable for anything I need. I was all excited because of the figures; but the figures don't mean crap when it's 40 pages at once being bombarded with "Hey make me a new one for the cache!" "Hey me too!"

The only problem is that during heavy traffic, the second my cache expires for a particular page, my app server gets hit with thousands of requests which bogs everything down / potentially crash the server. What I'm imagining is something more like…

A: App server pushes content automatically to memcached at its own pace (because it can certainly populate the cache… just not hundreds simultaneously). Front end server works with what it has – never attempts to proxy get requests. Hell, it can throw a blank page for all I care; at least the app server will be alive, accepting orders, and be able to populate that broken page at some point in the futre instead of a downward death spiral where I can't even pick up my shield (memcached). The problem is: I'd have to build a system that determines every page that's supposed to be cached from multiple places. Django knows which pages to cache; I suppose that will be easy. But nginx -> django: I wouldn't want it to proxy everything (or else I'd be in the same situation); so I'd have to code more logic in a separate location. Meh.

B: Nginx can limit connections to the app server. But how would it differentiate between requests that are supposed to get queued to the app server vs the types where I only want it to make a single connection? After all, the app decides whether a page should get cached or not. I wouldn't want it to drop connections if it's waiting on my app server for simple dynamic content like pulling order details. Do I build a request response cycle between cache/app which communicates that a page needs to be built? and that subsequent requests should be ignored?


So considering 99% of traffic is met by nginx, and my app server really is only interesting to the X% who convert and make it to a dynamic page, what should I do to prevent my app server from being completely inundated in the few seconds where it reponds:

Hey you! Let me return this page for you. Oh 1000 more people want it? Okay, I'll do that to…. if I could.

Real world issue/example: today, we got a huge spike in hits. We typically don't get much traffic, but we released our product today and pre-order customers came in like crazy. The server is handling the traffic @ 20% capacity now, but there were some very very sketchy times where I almost lost it when it tried to generate one puny page that everybody was clicking on when the hourly cache timer expired.

I frantically picked the most important cached pages and tried to keep them in cache. That was /not/ fun! I also manually pulled the HTML from apache running on :8080 and threw them into memcached.

If my apache process ran out of memory and completely crashed say memcached or it took long enough that the keys expired, there'd be a point of… "difficult return" where so many more people would be bypassing the cache that my server would be even more overloaded than usual / thus not normally recoverable without just blocking all requests and starting to fill the cache manually..

What's typically done to make this work?

Sorry for the stream of thought kind of post. I haven't slept due to this one..

# grove urls
# ----------
location / {
    set $use_memcached no;
    if ($request_method = GET) {
        set $use_memcached yes;
    if ($host ~ "^cached") {
        set $use_memcached no;
    if ($request_uri ~ '.{240,}') {
        set $use_memcached no;
    if ($args ~ nginx_bypass_cache=true) {
        set $use_memcached no;
    if ($use_memcached = yes) {
        set $memcached_key "nginx.$request_uri";
        memcached_pass localhost:11211;
    default_type text/html;
    client_max_body_size 50m;
    error_page 404 502 = @cache_miss;

location @cache_miss {
    proxy_redirect off;

    proxy_set_header   Host             $host;
    proxy_set_header   X-Real-IP        $remote_addr;
    proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;

    client_max_body_size       50m;
    client_body_buffer_size    128k;

    proxy_connect_timeout      60; # time to connect to upstream server
    proxy_send_timeout         300; # time to wait for upstream to accept data
    proxy_read_timeout         300; # time to wait for upstream to return data

    proxy_buffer_size          4k;
    proxy_buffers              4 32k;
    proxy_busy_buffers_size    64k;
    proxy_temp_file_write_size 64k;

Best Answer

If you can post your nginx config, I may be able to help you better way!

Generally, I wil use Nginx's fastcgi_cache / proxy_cache with fastcgi_cache_use_stale/proxy_cache_use_stale

I said both option because if u can run backend app using Nginx's fastcgi or other module then its better to do that way.

If Apache on 8080 cannot be removed, better use proxy_cache with proxy_cache_use_stale updating line.

Please provide your config so we can try to improve it.


Added a sample config based on yours (very raw, most likely will need tweaking)

#IMPORTANT outside server{..} block
proxy_cache_path /var/run/nginx-cache levels=1:2 keys_zone=GROVE:500m inactive=60m;
proxy_cache_key "$scheme$request_method$host$request_uri";
fastcgi_cache_use_stale updating;

server {

        #other stuff

        set $no_cache 0;

        # POST requests and urls with a query string should always go to PHP
        if ($request_method = POST) {
                set $no_cache 1;

    # grove urls
    # ----------
    location / {
        default_type text/html;
        client_max_body_size 50m;

        proxy_redirect off;

        proxy_set_header   Host             $host;
        proxy_set_header   X-Real-IP        $remote_addr;
        proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;

        client_body_buffer_size    128k;

        proxy_connect_timeout      60; # time to connect to upstream server
        proxy_send_timeout         300; # time to wait for upstream to accept data
        proxy_read_timeout         300; # time to wait for upstream to return data

        proxy_buffer_size          4k;
        proxy_buffers              4 32k;
        proxy_busy_buffers_size    64k;
        proxy_temp_file_write_size 64k;

        proxy_cache_bypass $no_cache;
            proxy_no_cache $no_cache;

            proxy_cache GROVE;
            proxy_cache_valid  60m;