Nginx – Buffer requests to nginx for a short time if backend is down

nginxnode.js

I have a nodejs app on forever running behind nginx. When I deploy new code I just do forever restart but I can't afford to get 502 even in that short time.

How to configure nginx to keep retrying at this one upstream server in case of 502? I tried setting proxy_connect_timeout, proxy_read_timeout and proxy_send_timeout to e.g. 30s but I immediately get 502 no matter what 🙁

My site conf is:

upstream my_server {
  server 127.0.0.1:3000 fail_timeout=0;
  keepalive 1024;
}

server {
 listen 3333;

server_name myservername.com;
access_log /var/log/nginx/my_server.log;

location / {
  proxy_set_header X-Real-IP $remote_addr;
  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  proxy_set_header Host $http_host;
  proxy_set_header X-NginX-Proxy true;

  # retry upstream on 502 error
  proxy_connect_timeout      30s;
  proxy_send_timeout         30s;
  proxy_read_timeout         30s;

  proxy_pass http://my_server;
  proxy_http_version 1.1;
  proxy_set_header Connection "";
  proxy_redirect off;
 }
}

Is it possible to buffer requests for this short time when upstream is not available?

Best Answer

This ultimately sounds like a problem with your backend: nginx makes a request to your backend, the connection is refused right away, so, nginx has no other option that to pass on an error downstream to the user, since no other upstream is specified, and the timeout values you specify play no effect here, since nginx doesn't have to wait for anything at all.

I don't now what forever is, or how it works, but there are a couple of possible solutions that come to mind.

There are two possibilities of what is happening on the upstream side:

"Forever" might be accepting the connection, and returning an error immediately. In this case, it sounds like what you really should be asking is how to make it not mishandle the connection, but wait until your app is finished deploying, and process the request then and there. The opengrok app on the tomcat server has this issue.
Noone is listening on the port where your app is supposed to run, so, the kernel is immediately dropping the packet, and returning a TCP RST packet right away.
- If TCP RST is the cause, you can solve it by having forever keep the listening socket, or by configuring the kernel to queue incoming packets for a certain time, in anticipation of someone picking them up later on, so that when forever does start back up, it'll have a whole queue ready for servicing.
- Configure the kernel to not issue TCP RST when noone's listening, then your timeout in nginx will have an effect. Subsequently, configure nginx to make a second request to another upstream.

If you address either one of the above cases, you're done.

Else, you have to try to configure nginx to fix the issue:

You could try proxy_cache with proxy_cache_use_stale.
You could try to employ the error handler: see proxy_intercept_errors (probably applies only if the 503 that you're getting is passed from your backend) and error_page. You would want to waste time in the error handler, until your app comes back up, then make a request to your app.
- You could waste time by running a second app, which would simply do a sleep() for however long it takes for your app to be redeployed, then either provide an HTTP redirect, or quit without a response; heck, you could simply implement this by attempting to proxy to a TCP port which you block drop in the firewall, which would activate your timeouts in nginx. Subsequently, configure nginx to make a second request.

If you implement one of the above approaches that involves timeout activation, then it would subsequently require that an extra request to the backend is made. You could use the upstream directive for that, where you would either specify the same server multiple times, or, if not accepted, can mirror a port through your firewall, or, better yet, you could actually run multiple independent app servers in the first place.

Which brings us back to your app server: if it cannot handle the issue of clean re-deployment, then maybe you should run two of such app servers, and use nginx to load-balance between them. Or deploy them anew, and then switch nginx to the new copy once it's actually ready. Else, how could you possibly be sure that your clients would be willing to wait even 30 s for your API to respond?

Best Answer

Related Solutions

Nginx – Help needed setting up nginx to serve static files

Nginx redirect issue with upstream configuration

Related Topic