Failover Pacemaker Cluster with Two Network Interfaces – Setup Guide

clustercorosyncfailoverpacemaker

So, i have two test servers in one vlan.

srv1
  eth1 10.10.10.11
  eth2 10.20.10.11

srv2
  eth1 10.10.10.12
  eth2 10.20.10.12

Cluster VIP - 10.10.10.100

Corosync config with two interfaces:

  rrp_mode: passive

  interface {
    ringnumber: 0
    bindnetaddr: 10.10.10.0
    mcastaddr: 226.94.1.1
    mcastport: 5405
  }

  interface {
    ringnumber: 1
    bindnetaddr: 10.20.10.0
    mcastaddr: 226.94.1.1
    mcastport: 5407
  }

Pacemaker config:

# crm configure show
node srv1
node srv2
primitive cluster-ip ocf:heartbeat:IPaddr2 \
    params ip="10.10.10.100" cidr_netmask="24" \
    op monitor interval="5s"
primitive ha-nginx lsb:nginx \
    op monitor interval="5s"
location prefer-srv-2 ha-nginx 50: srv2
colocation nginx-and-cluster-ip +inf: ha-nginx cluster-ip
property $id="cib-bootstrap-options" \
    dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
    cluster-infrastructure="openais" \
    expected-quorum-votes="2" \
    no-quorum-policy="ignore" \
    stonith-enabled="false"

Status:

# crm status
============
Last updated: Thu Jan 29 13:40:16 2015
Last change: Thu Jan 29 12:47:25 2015 via crmd on srv1
Stack: openais
Current DC: srv2 - partition with quorum
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ srv1 srv2 ]

 cluster-ip (ocf::heartbeat:IPaddr2):   Started srv2
 ha-nginx   (lsb:nginx):    Started srv2

Rings:

# corosync-cfgtool -s
Printing ring status.
Local node ID 185207306
RING ID 0
    id  = 10.10.10.11
    status  = ring 0 active with no faults
RING ID 1
    id  = 10.20.10.11
    status  = ring 1 active with no faults

And, if i do srv2# ifconfig eth1 down, pacemaker still works over eth2, and that's ok.
But nginx not available on 10.10.10.100 (becouse eth1 down, ya), and pacemeker says, that everything ok.

But, I want the nginx moves to srv1 after eth1 dies on srv2.

So, what can i do for that?

Best Answer

The ocf:pacemaker:pingd resource was designed precisely to failover a node over upon loss of connectivity. You may find a very brief example of this on the cluster labs wiki here: http://clusterlabs.org/wiki/Example_configurations#Set_up_pingd

Somewhat unrelated, but I have seen issues in the past with using ifconfig down to test loss of connectivity. I would strongly encourage that you instead use iptables to drop traffic to test loss of connectivity.