I have a simple two node server cluster, running on localhost:8001
and localhost:8002
, load-balanced using NGINX. Below is the http context of my nginx.conf
.
http {
include mime.types;
default_type application/octet-stream;
upstream backend {
ip_hash;
server localhost:8001;
server localhost:8002;
}
log_format upstreamlog 'upstream: $upstream_addr: $request upstream-response-status: $upstream_status';
server {
listen 80;
listen [::]:80;
server_name localhost;
access_log logs/access.log upstreamlog;
location / {
proxy_pass http://backend/;
}
}
}
Initially all requests to http://localhost/ were redirected to upstream server running at port 8001.
–Logs
upstream: [::1]:8001: GET / HTTP/1.1 upstream-response-status: 200
upstream: [::1]:8001: GET /favicon.ico HTTP/1.1 upstream-response-status: 200
upstream: [::1]:8001: GET / HTTP/1.1 upstream-response-status: 200
upstream: [::1]:8001: GET /favicon.ico HTTP/1.1 upstream-response-status: 200
upstream: [::1]:8001: GET / HTTP/1.1 upstream-response-status: 200
upstream: [::1]:8001: GET /favicon.ico HTTP/1.1 upstream-response-status: 200
upstream: [::1]:8001: GET / HTTP/1.1 upstream-response-status: 200
upstream: [::1]:8001: GET /favicon.ico HTTP/1.1 upstream-response-status: 200
----
upstream: [::1]:8001: GET / HTTP/1.1 upstream-response-status: 200
upstream: [::1]:8001: GET /favicon.ico HTTP/1.1 upstream-response-status: 200
upstream: [::1]:8001: GET / HTTP/1.1 upstream-response-status: 200
upstream: [::1]:8001: GET /favicon.ico HTTP/1.1 upstream-response-status: 200
upstream: [::1]:8001: GET / HTTP/1.1 upstream-response-status: 200
upstream: [::1]:8001: GET /favicon.ico HTTP/1.1 upstream-response-status: 200
upstream: [::1]:8001: GET / HTTP/1.1 upstream-response-status: 200
upstream: [::1]:8001: GET /favicon.ico HTTP/1.1 upstream-response-status: 200
Now for testing the fail-over of this setup, I stopped the server running at port 8001. But the fail-over did not work and all subsequent requests were also forwarded to the server at port 8001.
–Logs
upstream: [::1]:8001: GET / HTTP/1.1 upstream-response-status: -
upstream: [::1]:8001: GET / HTTP/1.1 upstream-response-status: -
upstream: [::1]:8001: GET / HTTP/1.1 upstream-response-status: -
upstream: [::1]:8001: GET / HTTP/1.1 upstream-response-status: -
upstream: [::1]:8001: GET / HTTP/1.1 upstream-response-status: -
upstream: [::1]:8001: GET / HTTP/1.1 upstream-response-status: -
upstream: [::1]:8001: GET / HTTP/1.1 upstream-response-status: -
----
upstream: [::1]:8001: GET / HTTP/1.1 upstream-response-status: -
upstream: [::1]:8001: GET / HTTP/1.1 upstream-response-status: -
upstream: [::1]:8001: GET / HTTP/1.1 upstream-response-status: -
upstream: [::1]:8001: GET / HTTP/1.1 upstream-response-status: -
----
upstream: [::1]:8001: GET / HTTP/1.1 upstream-response-status: -
upstream: [::1]:8001: GET / HTTP/1.1 upstream-response-status: -
upstream: [::1]:8001, 127.0.0.1:8001, [::1]:8002: GET / HTTP/1.1 upstream-response-status: 504, 504, 200
NGINX took a long time, approximately 3 minutes, for switching over to the other node at port 8002. What is that I am missing in the configuration? I know that default max_fails
is 1
and fail_timeout
is 10 seconds
. How to make NGINX switch-over to other server node with zero downtime?
(NOTE: ip_hash
had to be used for session affinity and other purposes)
Best Answer
I think you need to add proxy_next_upstream directive in location block. This directive function is to specify in which cases a request should be passed to the next server. Then add http_503, because when you stop the instance it will throw 503 or service unavailable. If your problem is because timeout you can change the proxy_connect_timeout and proxy_read_timeout. Example configuration
Here is the documentation for all proxy directive is in http://nginx.org/en/docs/http/ngx_http_proxy_module.html