Linux – Can’t change source IP address (to floating ip) for udp outgoing packages

high-availabilityiptableslinuxnetworkingudp

I'm having trouble getting a cluster to work using pacemaker and corosync. Here is my hardware configuration:

  • Network: 192.168.3.0/255.255.255.0
  • Gateway: 192.168.3.1
  • node1 (Ubuntu Server 12.04 x64) Static IP: 192.168.3.34
  • node2 (Ubuntu Server 12.04 x64) Static IP: 192.168.3.35

This network is behind a firewall.

I have two resources that communicate with external devices through udp:

  • resource1 -> udp port 16500
  • resource2 -> udp port 16501

The pacemaker command that I used:

crm configure primitive res1-srv upstart:resource1 \
op monitor interval=10s timeout=120 on-fail="restart" \
meta is-managed="true" failure-timeout=300 migration-threshold=5 allow-migrate=true

crm configure primitive res2-srv upstart:resource2 \
op monitor interval=10s timeout=120 on-fail="restart" \
meta is-managed="true" failure-timeout=300 migration-threshold=5 allow-migrate=true

resource1 and resource2 are not related to each other and there must be only one active instance of each one of them on the cluster (either on node1 or node2).

I created 2 floating IP addresses, one for each process:

  • res1-ipin -> 192.168.3.130
  • res2-ipin -> 192.168.3.131

I did this using:

crm configure primitive res1-ipin ocf:heartbeat:IPaddr2 \
params ip="192.168.3.130" cidr_netmask="24" op monitor interval="10s" meta is-managed="true"

crm configure primitive res2-ipin ocf:heartbeat:IPaddr2 \
params ip="192.168.3.131" cidr_netmask="24" op monitor interval="10s" meta is-managed="true"

On the firewall the administator configured 2 NAT rules:

  • [PUBLIC IP]:16500 –> 192.168.3.130:16500
  • [PUBLIC IP]:16501 –> 192.168.3.130:16501

I made a group for each one of them:

crm configure group resource1 res1-ipin res1-srv
crm configure group resource2 res2-ipin res2-srv

So I can have each resource in different nodes. The pacemaker manage theses resources without problems.

The incoming udp packets work perfectly, both resources process them without trouble.

However, the outgoing udp packets don't pass the firewall because the source ip address is the static one of the node. Here is an example:

  • resource1 is running on node1 -> the outgoing ip address and port is 192.168.3.34:16500
  • resource2 is running on node2 -> the outgoing ip address and port is 192.168.3.35:16501

Both of them are blocked by the firewall. And I do not have privileges to configure new rules on the firewall (I can't ask the administrator to configure them, he argues that the firewall don't allow that).

I tried to configure NAT on each node using iptables:

iptables -t nat -A POSTROUTING -p udp --sport 16500 -j SNAT --to-source 192.168.3.130:16500
iptables -t nat -A POSTROUTING -p udp --sport 16501 -j SNAT --to-source 192.168.3.131:16501

When I did that, the resource1 and resource2 write on their logs that they cannot send the packet:

Client ERROR *** Terminal nro:XXX writing 1Operation not permitted

I don't know what to do. I also tried to add the resource ocf:heartbeat:IPsrcaddr but it crash with:

IPsrcaddr[6200]: ERROR: command 'ip route replace 192.168.3.0/24 dev eth0 src 192.168.3.130' failed

I understand that using 2 floating IP addresses won't do any good.

If anyone can point out to me what I'm doing wrong I will be very grateful.

Thanks in advance

Best Answer

The IPsrcaddr resource agent should work for this. Configure the IPsrcaddr resources as shown below:

# crm configure
crm(live)configure# primitive res1_srcaddr IPsrcaddr \
    params ipaddress=192.168.3.130
crm(live)configure# primitive res2_srcaddr IPsrcaddr \
    params ipaddress=192.168.3.131

Then jump into your editor and add the new IPsrcaddr resources to the appropriate groups AFTER the IPaddr2 resources:

crm(live)configure# edit
...snip...
group resource1 res1-ipin res1_srcaddr res1-srv
group resource2 res2-ipin res2_srcaddr res2-srv
...snip...
crm(live)configure# verify
crm(live)configure# commit

If you still see the error pertaining to the IPsrcaddr, you could try running the command manually, to get a better understanding of why it's failing.

You could even try modifying the IPsrcaddr resource agent to better suit your nodes; it's a relatively simple resource agent:

# vi /usr/lib/ocf/resource.d/heartbeat/IPsrcaddr
Related Topic