Keepalived disconnects

keepalived

On my Sles i have keepalived, and haproxy. In two days i had 4 disconnects from keepalived.
Keepalived v1.2.7 (11/20,2012)

In syslog only this messages. Who can help with solving the problem?

Keepalived_vrrp[28102]: VRRP_Script(chk_haproxy) timed out
Keepalived_vrrp[28102]: Process [448] didn't respond to SIGTERM
Keepalived_vrrp[28102]: Process [450] didn't respond to SIGTERM
Keepalived_vrrp[28102]: VRRP_Script(chk_haproxy) succeeded

My config looks like

vrrp_script chk_haproxy {           
        script "killall -0 haproxy"    
        interval 2                     
        weight 2                        
}
vrrp_instance VIP_1 {
        interface eth2
        state MASTER
        virtual_router_id 88
        priority 101                   
        virtual_ipaddress {
            192.168.1.95
        }
        track_script {
            chk_haproxy
        }

Best Answer

We have a similar setup, but using kamailio instead of haproxy. Anyway, we were seeing messages like that, so we change the way we were performing the checks (our checks have nothing to do with yours, we were checking that kamailio responds to OPTIONs requests).

You can try to add fall 3, which means that the check script should fail 3 times before changing state. Also, weight is useless in the vrrp_script section.

vrrp_script chk_haproxy {           
        script "killall -0 haproxy"    
        interval 2                     
        fall 3                        
}
vrrp_instance VIP_1 {
        interface eth2
        state MASTER
        virtual_router_id 88
        priority 101                   
        virtual_ipaddress {
            192.168.1.95
        }
        track_script {
            chk_haproxy
        }

Good luck!

Related Solutions

Both servers running keepalived become master

This can actually be caused a bug. I know because I've had to fix it myself.

According to the RFC, when priorities are equal on both nodes;

        If the Priority in the ADVERTISEMENT is equal to the local
        Priority and the primary IP Address of the sender is greater
        than the local primary IP Address, then:

         o Cancel Adver_Timer
         o Set Master_Down_Timer to Master_Down_Interval
         o Transition to the {Backup} state

So, he who has the biggest IP address will win.

In keepalived, the way this is done is basically wrong. Endianness is not considered properly when doing this comparison.

Lets imagine we have two routers, (A)10.1.1.200 and (B)10.1.1.201.

The code should perform the following comparison.

On A:

if (10.1.1.201 > 10.1.1.200) // True
   be_backup();

On B:

if (10.1.1.200 > 10.1.1.201) // False
  be_master();

However because the endianness is not incorrectly handled, the following comparison is made instead.

On A:

if (10.1.1.201 > 200.1.1.10) // False
  be_master();

On B:

if (10.1.1.200 > 201.1.1.10) // False
  be_master();

This patch should work, but i've remade it from my original patch and have not tested it. Not even tested it compiles! So no refunds!

--- vrrp/vrrp.c.old 2013-10-13 17:39:29.421000176 +0100
+++ vrrp/vrrp.c 2013-10-13 18:07:57.360000966 +0100
@@ -923,7 +923,7 @@
    } else if (vrrp->family == AF_INET) {
        if (hd->priority > vrrp->effective_priority ||
            (hd->priority == vrrp->effective_priority &&
-            ntohl(saddr) > ntohl(VRRP_PKT_SADDR(vrrp)))) {
+            ntohl(saddr) > VRRP_PKT_SADDR(vrrp))) {
            log_message(LOG_INFO, "VRRP_Instance(%s) Received higher prio advert"
                        , vrrp->iname);
            if (proto == IPPROTO_IPSEC_AH) {

Keepalived: Highest weight only scheduling

From the LVS mailing list

None of the current IPVS schedulers do know "highest weight" balancing.

With the "weighted" schedulers, you can e.g. give your primary server 
a weight of max. 65535 and your secondary server a weight of 1. This way,
you've "almost" reached the point you're asking for - however, one out
of 64k of incoming connections will go for the "secondary" server even
while the primary server is still up and running.

If your application is balancing-ready, this behaviour may be a good thing.
For example, by automatically using the secondary system for a few live 
requests, you ensure your secondary system is actually working.
By sending some live traffic, you may also "warm up" application-specific
caches, so upon a "real" failover, the application will perform much better
than with empty caches.

If you really don't need (or your applications can't handle) the 
"balancing" part (distribute traffic to different servers at the same time),
you'd probably better run "typical" high availability/failover software 
like Pacemaker or some VRRP daemon.

For example, you might put all three boxes into the same VRPR instance
and assign them different VRRP priorities, and VRRP will sort out which box 
has the "best" priority and is going to be the only live system. This results
in some kind of "cascading" failover.


If you need balancing to distribute traffic among different servers,
and you'd still like to have this "cascading" failover, you'll need to run 
at least two balancer (pairs): one for the "primary" server farm, with the 
VIP of the other balancer being set as sorry server. The second balancer 
in turn balances to the "secondary" server farm and also has the maintenance 
server set as a sorry server.

One usecase for such scenarios are web farms with slightly different content: 
if the primary farm drops out of service (e.g. due to overload or some
bleeding-edge feature malfunctioning), the secondary farm may serve a less 
feature-rich version of the same service.

Best Answer

Related Solutions

Both servers running keepalived become master

Keepalived: Highest weight only scheduling

Related Topic