HAProxy nbsrv seems to give wrong value

haproxy

I'm using nbsrv in an ACL rule to determine if the request should be permitted(1).

Example configuration:

defaults 
  mode http 

frontend front_www
  listen 0.0.0.0:80 

  acl acl_backend_down nbsrv(back_www) lt 1 

  http-request deny if acl_backend_down 

  default_backend back_www

backend back_wwww

  server s1 1.2.3.1:80 check 
  server s2 1.2.3.2:80 check 
  server haproxy-dc2 1.3.4.1:80 check backup 

With this configuration, I would expect that when s1 and s2 are down, that nbserv would be 0 and thus return a HTTP 403.

Unfortunately it doesn't send 403 until I take down the backup server (haproxy-dc2).

If I change the ACL rule to less than two (i.e):

acl acl_backend_down nbsrv(back_www) lt 2 

Then it operates as I would perhaps expect: when there's only one primary available in the backend, it sends 403.

I thought perhaps there was something odd with the less than operator, so changed it to eq 0, but this doesn't work either.

Is there some way to force nbsrv to work like I'm expecting, or some other way of detecting that all the primaries for the backend are down?

(1) This ACL is part of a larger set of rules, I'm simplifying it down to the smallest reproducable example. If you have an alternative solution which would let me detect if we're using a backup server, then please let me know. The tl;dr of the larger picture is to permit two haproxy instances to fail over to each other, but prevent infinite loops (eg if dc01's backend is down, the response to dc02's healthcheck of dc01 should be a 403/down)

e:

To add some context to my comments on @gf_'s solution below… our production environment is rather complex. There's multiple backups+multiple DCs, and each instance has many backends (multiple applications, plus content switching)

For monitoring – theres internal HAProxy health-checks between instances, but also external monitoring via collectd+zabbix which will send us alerts if all of a backend's servers are unhealthy. Knowing that there's no backups available means we can bump the priority of the alert.

So, this is why I simplified the example down just showing nbsrv(back_www) lt 1 as the thing I was trying to fix.

At the moment I'm leaning towards going with lt 2 and assuming that if there's only the one box up, we don't want the other LBs to fail over here anyway.

Best Answer

I'm not entirely sure if the following will work for you, and can't test right now, but maybe it's helpful, still:

HAProxy config of dc01:

defaults 
  mode http 

frontend front_www
  listen 0.0.0.0:80 

  acl acl_backend_down nbsrv(back_www) lt 1 

  monitor-uri /health
  monitor fail if acl_backend_down

  use_backend dc02 if acl_backend_down
  default_backend back_www

backend back_wwww
  server s1 1.2.3.1:80 check 
  server s2 1.2.3.2:80 check

backend dc02
  server haproxy-dc2 1.3.4.1:80 check

Leverage the monitor-uri and monitor fail directives and check /health from dc02 (and vice versa). /health should report HTTP 200 if at least one backend is alive and healthy, otherwise HTTP 503.