I am trying to make a loadbalanced gateway for a group of natted machines.
I have 3 centos nodes. Initially only one node was supposed to have the internal gateway ip and that works well. Traffic flows.
Then, Im trying out loadbalancing the gateway via clusterip_hash/clone option. At the bottom there is a resource creation with pcs, my small location constraint (dont move the ip to a machine that has no "internet") and lastly the clone command.
Once i clone the resource, i can see them running correctly on two hosts, and each one has iptables rule added:
Chain INPUT (policy DROP)
target prot opt source destination
CLUSTERIP all -- anywhere gateway CLUSTERIP hashmode=sourceip-sourceport clustermac=81:48:85:71:7F:47 total_nodes=2 local_node=2 hash_init=0
The problem is that as soon as arp entry changes from current real physical mac of either gateway machine to the clustermac shown in iptables, all of the natted machines loose internet connectivity.
I added iptables logging for dropped packets, but nothing seems to be dropped. At the same time, nothing seems to go through. (10.10.0.52 is a randomly picked natted host trying to ping google, if the virtual-ip-clone is removed and changed to single floating-ip, then the traffic flows again)
[root@three ~]# tcpdump -nni enp1s0 icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on enp1s0, link-type EN10MB (Ethernet), capture size 65535 bytes
16:40:36.898612 IP 10.10.0.52 > 8.8.8.8: ICMP echo request, id 18875, seq 188, length 64
16:40:37.906651 IP 10.10.0.52 > 8.8.8.8: ICMP echo request, id 18875, seq 189, length 64
Pacemaker config, done via pcs:
pcs resource create ip_internal_gw ocf:heartbeat:IPaddr2 params ip="10.10.0.250" cidr_netmask="24" nic="enp1s0" clusterip_hash="sourceip-sourceport" op start interval="0s" timeout="60s" op monitor interval="5s" timeout="20s" op stop interval="0s" timeout="60s"
pcs resource clone ip_internal_gw meta globally-unique=true master-max="2" master-node-max="2" clone-max="2" clone-node-max="1" notify="true" interleave="true"
pcs constraint location ip_internal_gw rule id=ip_internal_gw_needs_internet score=-INFINITY not_defined pingd or pingd lte 0
[root@three ~]# pcs status
Cluster name:
Last updated: Wed May 25 16:51:15 2016 Last change: Wed May 25 16:35:53 2016 by root via cibadmin on two.gateway.shire
Stack: corosync
Current DC: two.gateway.shire (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
3 nodes and 5 resources configured
Online: [ one.gateway.shire three.gateway.shire two.gateway.shire ]
Full list of resources:
Clone Set: ping-clone [ping]
Started: [ one.gateway.shire three.gateway.shire two.gateway.shire ]
Clone Set: ip_internal_gw-clone [ip_internal_gw] (unique)
ip_internal_gw:0 (ocf::heartbeat:IPaddr2): Started three.gateway.shire
ip_internal_gw:1 (ocf::heartbeat:IPaddr2): Started two.gateway.shire
What is blocking the traffic? Im sure im missing something basic.
Best Answer
It seems that:
helped to get it running.