iptables – Source NAT with Multiple NIC & Public IP Addresses Not Working


I just have a problem on the multiple NIC & IP addresses server. I've configured my Cent OS 7 server with an additional public IP address which belongs to eth1. When iptables source NAT enabled, only the additional IP address couldn't be used.

In this post, two public IP addresses are indicated by (eth0) and (eth1). Subnet-mask of is, default gateway is set properly ( for, for

below is contents of /etc/sysconfig/network-scripts/ifcfg-eth1. (HWADDR is censored)


below is output of ip route.

default via dev eth0 proto static metric 100 dev eth0 proto kernel scope link src metric 100 dev eth1 proto kernel scope link src

below is output of ip rule.

0:      from all lookup local
32765:  from lookup subroute-eth1
32766:  from all lookup main
32767:  from all lookup default

below is contents of /etc/sysconfig/network-scripts/route-eth1.

default via table subroute-eth1

below is contents of /etc/sysconfig/network-scripts/rule-eth1.

from table subroute-eth1

then, the following request has been sent properly and responded proper IP address as string for each public IP addresses.

# curl --interface eth0 api.ipify.org  # outputs
# curl --interface eth1 api.ipify.org  # outputs

here, I want each users of the system use different public IP address. So applied the following iptables source NAT. Where uid=1000 is A, uid=1001 is B, respectively.

# iptables -t nat -m owner --uid-owner 1000 -A POSTROUTING -j SNAT --to-source
# iptables -t nat -m owner --uid-owner 1001 -A POSTROUTING -j SNAT --to-source

in this situation, eth0 works correctly, but eth1 has timed out.

# curl api.ipify.org  # outputs
# sudo -u A curl api.ipify.org  # outputs
# sudo -u B curl api.ipify.org  # **time out**

I have no idea what is wrong because of lack of network experience. If anyone has ideas, help me, please.
Note that in this situation, ssh root@ from Internet is still working properly.

Edit: below is output of ip route show table subroute-eth1

default via dev eth1

Best Answer

The uid should trigger a routing change to use eth1 instead of eth0. nat/POSTROUTING's SNAT (while still needed) comes too late for this because as the name POSTROUTING implies it happens after the routing decisions: the network interface is chosen and won't change anymore.

For local traffic the rule has to happen in the mangle/OUTPUT chain to alter the route, by using marks which can trigger an ip rule lookup:

# iptables -t mangle -A OUTPUT -m owner --uid-owner 1000 -j MARK --set-mark 1000
# iptables -t mangle -A OUTPUT -m owner --uid-owner 1001 -j MARK --set-mark 1001

The information is then reused via ip rule. Even 1000 has to be considered (to handle the specific case uid1000$ curl --interface eth1 http://api.ipify.org/ which should not use entry 32765):

ip rule add fwmark 1000 lookup main
ip rule add fwmark 1001 lookup subroute-eth1

Note that the initial default IP was chosen before: SNAT is still needed to fix it to the new correct one. Why using this mark since there's already a rule with from lookup subroute-eth1 then? Again that's because of the order of various evaluations in the network stack, as summarized by this Packet flow in Netfilter and General Networking schematic: only a change in mangle/OUTPUT can trigger the reroute check seen in the schematic, and by this time the IP hasn't been corrected to

In the reverse path, some network elements will not know about those routing tables in time anyway, loose mode has to be set on eth1's rp_filter or return packets will be dropped:

# echo 2 > /proc/sys/net/ipv4/conf/eth1/rp_filter

Same for eth0 to handle the specific case for uid 1000 above (uid1000$ curl --interface eth1 http://api.ipify.org/):

# echo 2 > /proc/sys/net/ipv4/conf/eth0/rp_filter

With this it should mostly work. But actually a few packets aren't owned by uid 1001: TCP RST, last ACK once the process has ended, or for UDP, related ICMP errors... are owned only by the kernel. They won't match uid 1001, will be sent on the wrong interface (with the wrong IP on this wrong interface) and this will create some timeouts at the end of the connections, more probably at the remote side rather than locally. This can be seen by doing a kill -KILL on a client with an altered connection made with user 1001 via eth1, and witnessing several retry ACK packets with source sent via eth0 (instead of eth1) acknowledging the FIN from remote received several times on eth1.

So all this has to be reworked using CONNMARK to track flows initiated by the user, instead of packets initiated by the user (some explanations here). nat doesn't have to change, since only the 1st packet in NEW state goes through iptables rules anyway and it's tracking flows already. Let's start over:

# iptables -t mangle -F
# iptables -t mangle -A PREROUTING -j CONNMARK --restore-mark
# iptables -t mangle -A OUTPUT -j CONNMARK --restore-mark
# iptables -t mangle -A OUTPUT -m owner --uid-owner 1000 -j MARK --set-mark 1000
# iptables -t mangle -A OUTPUT -m owner --uid-owner 1001 -j MARK --set-mark 1001
# iptables -t mangle -A OUTPUT -j CONNMARK --save-mark

Everything should behave correctly now.

Final note: while it wasn't needed in your setting, the link local route for eth1 should be duplicated on the table subroute-eth1, else accessing eth1's LAN from uid 1001 will give strange results, because rule 32765 is read before rule 32766 so there won't ever be the link local route defined, as the result of this command shows:

# ip route get mark 1001 via dev eth1 table subroute-eth1 src mark 0x3e9 uid 0 \    cache

It might prevent access to all of the LAN except the gateway.

You should add (or put as appropriate in the system configuration files):

ip route add table subroute-eth1 dev eth1 scope link src

And then the previous ip route get command will give instead the correct: dev eth1 table subroute-eth1 src mark 0x3e9 uid 0 \    cache