Both servers running keepalived become master

keepalived

After a network failure,both servers running keepalived become master.

When the network is reestablished, both keep the MASTER state.

What could be causing it?

Edited: Another information that might be relevant, each server has two NICs.

Here is the virtual instance configuration:

vrrp_instance VGAPP {
    interface eth0
    virtual_router_id 61
    state BACKUP
    nopreempt
    priority 50
    advert_int 3
    virtual_ipaddress {
        10.26.57.61/24
    }
    track_interface {
       eth0
    }
    track_script {
        jboss_check
        #tomcat_check
        #interface_check
        #interface_check02
    }
    notify_master "/opt/keepalived/scripts/set_state.sh MASTER"
    notify_backup "/opt/keepalived/scripts/set_state.sh BACKUP"
    notify_fault  "/opt/keepalived/scripts/set_state.sh FAULT"
    notify_stop   "/opt/keepalived/scripts/set_state.sh STOPPED"}

Best Answer

This can actually be caused a bug. I know because I've had to fix it myself.

According to the RFC, when priorities are equal on both nodes;

        If the Priority in the ADVERTISEMENT is equal to the local
        Priority and the primary IP Address of the sender is greater
        than the local primary IP Address, then:

         o Cancel Adver_Timer
         o Set Master_Down_Timer to Master_Down_Interval
         o Transition to the {Backup} state

So, he who has the biggest IP address will win.

In keepalived, the way this is done is basically wrong. Endianness is not considered properly when doing this comparison.

Lets imagine we have two routers, (A)10.1.1.200 and (B)10.1.1.201.

The code should perform the following comparison.

On A:

if (10.1.1.201 > 10.1.1.200) // True
   be_backup();

On B:

if (10.1.1.200 > 10.1.1.201) // False
  be_master();

However because the endianness is not incorrectly handled, the following comparison is made instead.

On A:

if (10.1.1.201 > 200.1.1.10) // False
  be_master();

On B:

if (10.1.1.200 > 201.1.1.10) // False
  be_master();

This patch should work, but i've remade it from my original patch and have not tested it. Not even tested it compiles! So no refunds!

--- vrrp/vrrp.c.old 2013-10-13 17:39:29.421000176 +0100
+++ vrrp/vrrp.c 2013-10-13 18:07:57.360000966 +0100
@@ -923,7 +923,7 @@
    } else if (vrrp->family == AF_INET) {
        if (hd->priority > vrrp->effective_priority ||
            (hd->priority == vrrp->effective_priority &&
-            ntohl(saddr) > ntohl(VRRP_PKT_SADDR(vrrp)))) {
+            ntohl(saddr) > VRRP_PKT_SADDR(vrrp))) {
            log_message(LOG_INFO, "VRRP_Instance(%s) Received higher prio advert"
                        , vrrp->iname);
            if (proto == IPPROTO_IPSEC_AH) {

Best Answer

Related Solutions

Keepalived issues on xen domU

Keepalived nopreempt option not working

Related Topic