Nginx: try next upstream on http 200 response with custom header set

nginx

I have an nginx setup which works very well.
I am using an upstream block to load balance between two servers.
These servers give HTTP 200 response even if they unable to serve the request, but they set custom http header like this:

X-Response-Status: Failed

When the response is OK, they give:

X-Response-Status: OK

Is it possible to configure nginx to try next upstream server while the first one gives "X-Response-Status: Failed"?

I mean like following in nginx config:

proxy_next_upstream error timeout http_* ($sent_http_x_response_status="Failed")

Thanks!

UPDATE1:

I have to keep upstream servers giving HTTP 200 even on errors because there is an error message in http body in binary form.

I have to pass this binary error to client if none of upstream server are able to serve the request.

Best Answer

There's no option for proxy_next_upstream to implement the behavior you describe.

Your application should not return an HTTP 200 if it couldn't actually process the request. Have the application return a more appropriate error, such as 500 or 503.

Related Solutions

Nginx proxy_read_timeout vs. proxy_connect_timeout

I was actually unable to reproduce this on:

2011/08/20 20:08:43 [notice] 8925#0: nginx/0.8.53
2011/08/20 20:08:43 [notice] 8925#0: built by gcc 4.1.2 20080704 (Red Hat 4.1.2-48)
2011/08/20 20:08:43 [notice] 8925#0: OS: Linux 2.6.39.1-x86_64-linode19

I set this up in my nginx.conf:

proxy_connect_timeout   10;
proxy_send_timeout      15;
proxy_read_timeout      20;

I then setup two test servers. One that would just timeout on the SYN, and one that would accept connections but never respond:

upstream dev_edge {
  server 127.0.0.1:2280 max_fails=0 fail_timeout=0s; # SYN timeout
  server 10.4.1.1:22 max_fails=0 fail_timeout=0s; # accept but never responds
}

Then I sent in one test connection:

[m4@ben conf]$ telnet localhost 2480
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET / HTTP/1.1
Host: localhost

HTTP/1.1 504 Gateway Time-out
Server: nginx
Date: Sun, 21 Aug 2011 03:12:03 GMT
Content-Type: text/html
Content-Length: 176
Connection: keep-alive

Then watched error_log which showed this:

2011/08/20 20:11:43 [error] 8927#0: *1 upstream timed out (110: Connection timed out) while connecting to upstream, client: 127.0.0.1, server: ben.dev.b0.lt, request: "GET / HTTP/1.1", upstream: "http://10.4.1.1:22/", host: "localhost"

then:

2011/08/20 20:12:03 [error] 8927#0: *1 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 127.0.0.1, server: ben.dev.b0.lt, request: "GET / HTTP/1.1", upstream: "http://127.0.0.1:2280/", host: "localhost"

And then the access.log which has the expected 30s timeout (10+20):

504:32.931:10.003, 20.008:.:176 1 127.0.0.1 localrhost - [20/Aug/2011:20:12:03 -0700] "GET / HTTP/1.1" "-" "-" "-" dev_edge 10.4.1.1:22, 127.0.0.1:2280 -

Here is the log format I'm using which includes the individual upstream timeouts:

log_format  edge  '$status:$request_time:$upstream_response_time:$pipe:$body_bytes_sent $connection $remote_addr $host $remote_user [$time_local] "$request" "$http_referer" "$http_user_agent" "$http_x_forwarded_for" $edge $upstream_addr $upstream_cache_status';

Nginx reverse proxy – try upstream A, then B, then A again

Key points:

Don't bother with upstream blocks for failover, if pinging one server will bring another one up - there's no way to tell nginx (at least, not the FOSS version) that the first server is up again. nginx will try the servers in order on the first request, but not follow-up requests, despite any backup, weight or fail_timeout settings.
You must enable recursive_error_pages when implementing failover using error_page and named locations.
Enable proxy_intercept_errors to handle error codes sent from the upstream server.
The = syntax (e.g. error_page 502 = @handle_502;) is required to correctly handle error codes in the named location. If = is not used, nginx will use the error code from the previous block.

Here is a summary:

server {
    listen ...;
    server_name $DOMAINS;

    recursive_error_pages on;

    # First, try "Upstream A"
    location / {
        error_page 418 = @backend;
        return 418;
    }

    # Define "Upstream A"
    location @backend {
        proxy_pass http://$IP:81;
        proxy_set_header  X-Real-IP     $remote_addr;
        # Add your proxy_* options here
    }

    # On error, go to "Upstream B"
    error_page 502 @handle_502;

    # Fallback static error page, in case "Upstream B" fails
    root /home/nginx/www;
    location = /_static_error.html {
        internal;
    }

    # Define "Upstream B"
    location @handle_502 { # What to do when the backend server is not up
        proxy_pass ...;
        # Add your proxy_* options here
        proxy_intercept_errors on;          # Look at the error codes returned from "Upstream B"
        error_page 502 /_static_error.html; # Fallback to error page if "Upstream B" is down
        error_page 451 = @backend;          # Try "Upstream A" again
    }
}

Original answer / research log follow:

Here's a ~~better~~ workaround I found, which is an improvement since it doesn't require a client redirect:

upstream aba {
    server $BACKEND-IP;
    server 127.0.0.1:82 backup;
    server $BACKEND-IP  backup;
}

...

location / {
    proxy_pass http://aba;
    proxy_next_upstream error http_502;
}

Then, just get the control server to return 502 on "success" and hope that code is never returned by backends.

Update: nginx keeps marking the first entry in the upstream block as down, so it does not try the servers in order on successive requests. I've tried adding weight=1000000000 fail_timeout=1 to the first entry with no effect. So far I have not found any solution which does not involve a client redirect.

Edit: One more thing I wish I knew - to get the error status from the error_page handler, use this syntax: error_page 502 = @handle_502; - that equals sign will cause nginx to get the error status from the handler.

Edit: And I got it working! In addition to the error_page fix above, all that was needed was enabling recursive_error_pages!

Best Answer

Related Solutions

Nginx proxy_read_timeout vs. proxy_connect_timeout

Nginx reverse proxy – try upstream A, then B, then A again

Related Topic