Nginx: special behaviour for upstream “Host not found” errors

gatewaykubernetesnginx

When nginx's proxy_pass returns a 502, there may be a broad range of reasons. What I want is to be able to detect when 502 was returned because upstream host was not found (that is, failed to resolve).

I know of proxy_intercept_errors, but it doesn't seem to be helpful in my case.

What I have

I have an nginx gateway server running on a Kubernetes pod. It is configured to route requests to appropriate Kubernetes services according to the first part of the hostname (the word before the first dot, e.g. service-name.example.com should route to a service called service-name).

Here is a simplified config section responsible for this logic:

server {
  listen 80;
  resolver 172.16.2.3; // Pod IP address
  server_name "~^(?<svc>[\w-]+)\.";

  location / {
    # Each Kubernetes service has an internal domain name matching the following pattern
    proxy_pass "http://$svc.default.svc.cluster.local";
    proxy_set_header Host $host;
    # Proxy `X-Forwarded` headers sent by ELB: http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/x-forwarded-headers.html
    proxy_set_header X-Forwarded-For $http_x_forwarded_for;
    proxy_set_header X-Forwarded-Port $http_x_forwarded_port;
    proxy_set_header X-Forwarded-Proto $http_x_forwarded_proto;
  }
}

Problem

No matter why upstream is not accessible (if it refuses connections, fails internally or just does not exist), nginx returns 502. It's only the nginx error log where you can see the actual cause.

Since the gateway is publicly available through AWS ELB, it gets often accessed by IP or just random names, which creates noise in monitors set up to react on 5XX error spikes.

What I want to do

Set up nginx to return some less aggressive error (say, 404) in case if the service's hostname can't be resolved by Kubernetes resolver.

For example, I send the following request:

curl -H "Host: non-existent-service.example.com" http://gateway.example.com

I want nginx to be able to detect the fact that the hostname corresponding to the service could not be internally resolved, and then return a 404 instead of 502.

Currently the logs look as follows:

  • error log:

    2017/11/10 16:03:58 [error] 22#22: *482894 non-existent-service.default.svc.cluster.local could not be resolved (3: Host not found), client: 172.16.1.2, server: ~^(?<svc>[\w-]+)\., request: "GET / HTTP/1.1", host: "non-existent-service.example.com"
    
  • access log:

    172.16.1.2 - - [10/Nov/2017:16:03:58 +0000] "non-existent-service.example.com" "GET / HTTP/1.1" 502 173 "-" "curl/7.43.0" "194.126.122.250" "EE"
    

UPDATE

Should have mentioned this in the first place. A "catch-all" default server block was the first thing to try. Turned out that this block never gets reached, because virtually any hostname matches the regexp.

Best Answer

Just re-enable the default virtual host and ignore anything that hits it (as such requests are querying the IP directly, or are malicious).

For example, as seen in the nginx 1.12.x nginx.conf:

    server {
        listen       80 default_server;
        listen       [::]:80 default_server;
        server_name  _;
        root         /usr/share/nginx/html;

        # Load configuration files for the default server block.
        include /etc/nginx/default.d/*.conf;

        location / {
        }

        error_page 404 /404.html;
            location = /40x.html {
        }

        error_page 500 502 503 504 /50x.html;
            location = /50x.html {
        }
    }