This can actually be caused a bug. I know because I've had to fix it myself.
According to the RFC, when priorities are equal on both nodes;
If the Priority in the ADVERTISEMENT is equal to the local
Priority and the primary IP Address of the sender is greater
than the local primary IP Address, then:
o Cancel Adver_Timer
o Set Master_Down_Timer to Master_Down_Interval
o Transition to the {Backup} state
So, he who has the biggest IP address will win.
In keepalived, the way this is done is basically wrong. Endianness is not considered properly when doing this comparison.
Lets imagine we have two routers, (A)10.1.1.200 and (B)10.1.1.201.
The code should perform the following comparison.
On A:
if (10.1.1.201 > 10.1.1.200) // True
be_backup();
On B:
if (10.1.1.200 > 10.1.1.201) // False
be_master();
However because the endianness is not incorrectly handled, the following comparison is made instead.
On A:
if (10.1.1.201 > 200.1.1.10) // False
be_master();
On B:
if (10.1.1.200 > 201.1.1.10) // False
be_master();
This patch should work, but i've remade it from my original patch and have not tested it. Not even tested it compiles! So no refunds!
--- vrrp/vrrp.c.old 2013-10-13 17:39:29.421000176 +0100
+++ vrrp/vrrp.c 2013-10-13 18:07:57.360000966 +0100
@@ -923,7 +923,7 @@
} else if (vrrp->family == AF_INET) {
if (hd->priority > vrrp->effective_priority ||
(hd->priority == vrrp->effective_priority &&
- ntohl(saddr) > ntohl(VRRP_PKT_SADDR(vrrp)))) {
+ ntohl(saddr) > VRRP_PKT_SADDR(vrrp))) {
log_message(LOG_INFO, "VRRP_Instance(%s) Received higher prio advert"
, vrrp->iname);
if (proto == IPPROTO_IPSEC_AH) {
On host 1:
vrrp_instance VI_1 {
state MASTER
interface eth0
dont_track_primary
virtual_router_id 1
priority 150
advert_int 5
mcast_src_ip 172.16.40.1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
10.0.0.1/24 dev eth0
}
}
vrrp_instance VI_2 {
state BACKUP
interface eth0
dont_track_primary
virtual_router_id 5
priority 100
advert_int 5
mcast_src_ip 172.16.40.1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
10.0.0.2/24 dev eth0
}
}
On host 2:
vrrp_instance VI_1 {
state BACKUP
interface eth0
dont_track_primary
virtual_router_id 1
priority 100
advert_int 5
mcast_src_ip 172.16.40.2
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
10.0.0.1/24 dev eth0
}
}
vrpp_instance VI_2 {
state MASTER
interface eth0
dont_track_primary
virtual_router_id 5
priority 150
advert_int 5
mcast_src_ip 172.16.40.2
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
10.0.0.2/24 dev eth0
}
}
For the no-auto-failback, look at nopreempt in the keepalived.conf man page but also see:
http://article.gmane.org/gmane.linux.keepalived.devel/1537%22
Best Answer
In what kind of environment are you running this keepalived instances? I've seen similar issues in environments that are not supporting multicast. Keepalived uses mulitcast for VRRP advertisements by default. So, try using unicast instead. This is example for MASTER instance, for BACKUP instance just replace unicast_src_ip and unicast_peer addresses.