This can actually be caused a bug. I know because I've had to fix it myself.
According to the RFC, when priorities are equal on both nodes;
If the Priority in the ADVERTISEMENT is equal to the local
Priority and the primary IP Address of the sender is greater
than the local primary IP Address, then:
o Cancel Adver_Timer
o Set Master_Down_Timer to Master_Down_Interval
o Transition to the {Backup} state
So, he who has the biggest IP address will win.
In keepalived, the way this is done is basically wrong. Endianness is not considered properly when doing this comparison.
Lets imagine we have two routers, (A)10.1.1.200 and (B)10.1.1.201.
The code should perform the following comparison.
On A:
if (10.1.1.201 > 10.1.1.200) // True
be_backup();
On B:
if (10.1.1.200 > 10.1.1.201) // False
be_master();
However because the endianness is not incorrectly handled, the following comparison is made instead.
On A:
if (10.1.1.201 > 200.1.1.10) // False
be_master();
On B:
if (10.1.1.200 > 201.1.1.10) // False
be_master();
This patch should work, but i've remade it from my original patch and have not tested it. Not even tested it compiles! So no refunds!
--- vrrp/vrrp.c.old 2013-10-13 17:39:29.421000176 +0100
+++ vrrp/vrrp.c 2013-10-13 18:07:57.360000966 +0100
@@ -923,7 +923,7 @@
} else if (vrrp->family == AF_INET) {
if (hd->priority > vrrp->effective_priority ||
(hd->priority == vrrp->effective_priority &&
- ntohl(saddr) > ntohl(VRRP_PKT_SADDR(vrrp)))) {
+ ntohl(saddr) > VRRP_PKT_SADDR(vrrp))) {
log_message(LOG_INFO, "VRRP_Instance(%s) Received higher prio advert"
, vrrp->iname);
if (proto == IPPROTO_IPSEC_AH) {
Some guess:
Are you sure that the vrrp traffic gets through? Could you sniff (e.g. ngrep, tcpdump) on port 112 if packets are received? (You should see one each second.) See this link.
If not, it could be a firewall issue.
Best Answer
You could use the notify command to write out a state file.
The create a notify script like:
And a get state script like: