I am trying to achieve a simple failover using Keeplalived, postgres and gluster.
Using CentOs 7
I have mounted a replicated gluster volume on both nodes at '/var/lib/pgsql'.
Shared ip(Keepalived): 192.168.1.20
node01: 192.168.1.11
node02: 192.168.1.12
pgsql-check script contents :
#!/usr/bin/python
import subprocess
import sys
try:
subprocess.check_call(['/usr/bin/systemctl', 'status', 'postgresql.service'])
sys.exit(0)
except subprocess.CalledProcessError:
sys.exit(3)
Notify script contents :
#!/usr/bin/python
import sys
import subprocess
if sys.argv[3] == "MASTER":
try:
subprocess.check_call(['/usr/bin/systemctl start postgresql.service'])
except subprocess.CalledProcessError:
pass
sys.exit(0)
if sys.argv[3] == "BACKUP":
try:
subprocess.check_call(['/usr/bin/systemctl', 'stop', 'postgresql.service'])
except subprocess.CalledProcessError:
pass
sys.exit(0)
if sys.argv[3] == "FAULT":
try:
subprocess.check_call(['/usr/bin/systemctl', 'stop', 'postgresql.service'])
except subprocess.CalledProcessError:
pass
sys.exit(0)
sys.exit(1)
keepalived.conf :
vrrp_script chk_pgsql {
script "/etc/keepalived/pgsql-check"
interval 2 # check every 2 seconds
fall 2 # require 2 failures for KO
rise 2 # require 2 successes for OK
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.1.20
}
track_script {
chk_pgsql
}
notify "/etc/keepalived/notify"
}
When the machines boot they enter FAULT state. But the main machine needs to enter MASTER state. When i start postgres manually and restart keepalived on the master everything is fine. When i try to do a failover, both machines have the FAULT state and don't recover.
Can anyone help with the config/scripts? Do i misunderstand the notify or check mechanism?
Best Answer
When specifying a weight for the script of 1 like so:
Then all of a sudden all works as expected. The default weight was 0.
I found this out after reading this link : http://comments.gmane.org/gmane.linux.keepalived.devel/2586
It is not the answer , but it pointed me in the right direction.
Current configuration :