Keepalived: Highest weight only scheduling

keepalived

I have a keepalived setup with three servers behind an ip. One is setup as a sorry server and only serves the maintenance pages, the other two are actual app servers. We would like it setup so that traffic only routes to the one server until it goes down and then have the other server take the traffic, until the primary one comes back online.

Leaving out lb_algo causes this error and keepalived refusing to start

Jan 23 17:15:22 fw001 kernel: IPVS: Scheduler module ip_vs_ not found

And the only options for lb_algo are:

rr|wrr|lc|wlc|lblc|sh|dh

Which all load balance across the active servers in some fashion.

Config example

virtual_server 203.0.113.0 80 {
    delay_loop 60
    lb_algo wrr
    lb_kind NAT
    nat_mask 255.255.255.0
    persistence_timeout 50
    protocol TCP

    sorry_server 10.0.0.3 8080

    real_server 10.0.0.1 8080 {
        weight 100

        HTTP_GET {
            url {
                path /alive
                digest 7a13a825b31584fe9b135ab53974d893
            }
            connect_timeout 30
            nb_get_retry 30
            delay_before_retry 10
        }
    }

    real_server 10.0.0.2 8080 {
        weight 0

        HTTP_GET {
            url {
                path /alive
                digest 7a13a825b31584fe9b135ab53974d893
            }
            connect_timeout 30
            nb_get_retry 30
            delay_before_retry 10
        }
    }
}

Is there any way to do this?

Best Answer

From the LVS mailing list

None of the current IPVS schedulers do know "highest weight" balancing.

With the "weighted" schedulers, you can e.g. give your primary server 
a weight of max. 65535 and your secondary server a weight of 1. This way,
you've "almost" reached the point you're asking for - however, one out
of 64k of incoming connections will go for the "secondary" server even
while the primary server is still up and running.

If your application is balancing-ready, this behaviour may be a good thing.
For example, by automatically using the secondary system for a few live 
requests, you ensure your secondary system is actually working.
By sending some live traffic, you may also "warm up" application-specific
caches, so upon a "real" failover, the application will perform much better
than with empty caches.

If you really don't need (or your applications can't handle) the 
"balancing" part (distribute traffic to different servers at the same time),
you'd probably better run "typical" high availability/failover software 
like Pacemaker or some VRRP daemon.

For example, you might put all three boxes into the same VRPR instance
and assign them different VRRP priorities, and VRRP will sort out which box 
has the "best" priority and is going to be the only live system. This results
in some kind of "cascading" failover.


If you need balancing to distribute traffic among different servers,
and you'd still like to have this "cascading" failover, you'll need to run 
at least two balancer (pairs): one for the "primary" server farm, with the 
VIP of the other balancer being set as sorry server. The second balancer 
in turn balances to the "secondary" server farm and also has the maintenance 
server set as a sorry server.

One usecase for such scenarios are web farms with slightly different content: 
if the primary farm drops out of service (e.g. due to overload or some
bleeding-edge feature malfunctioning), the secondary farm may serve a less 
feature-rich version of the same service.