I have 2 nodes with Keepalived v1.2.13, in Amazon AWS VPC.
I'm trying to achieve this scenario, assuming Node1 is MASTER:
If I stop HAProxy or stop keepalived or stop the node, failover to Node2.
If I start HAProxy back up on Node1 or start keepalived or start the node, do not failover to Node1 (no flapping).
With the following configuration, only by stopping keepalived or stopping the node, does the failover works. The priority change due to the track_script doesn't seem to affect the MASTER election.
Node1
vrrp_script chk_haproxy { # Requires keepalived-1.1.13
script "killall -0 haproxy" # cheaper than pidof
interval 2 # check every 2 seconds
fall 2
weight 2 # add 2 points of prio if OK
}
vrrp_instance VI_1 {
advert_int 2
state BACKUP
nopreempt
interface eth0
virtual_router_id 51
priority 101
unicast_peer {
172.17.16.10
}
notify_master "/etc/keepalived/randomscript.sh"
track_script {
chk_haproxy
}
}
Node2
vrrp_script chk_haproxy { # Requires keepalived-1.1.13
script "killall -0 haproxy" # cheaper than pidof
interval 2 # check every 2 seconds
fall 2
weight 2 # add 2 points of prio if OK
}
vrrp_instance VI_1 {
advert_int 2
state BACKUP
nopreempt
interface eth0
virtual_router_id 51
priority 100
unicast_peer {
172.17.16.11
}
notify_master "/etc/keepalived/randomscript.sh"
track_script {
chk_haproxy
}
}
Best Answer
I ended up using on both nodes :
The race condition was due to some kind of issue with the security group of the instances. So this is an issue specific to AWS.
For an unknown reason, VRRP Unicast works even though not explicitly allowed in the security group. I explicitly opened it (Custom Protocol 112) and it fixed the issue. It seems like it takes time to allow the packet during the initialization of a stack.