DNATed IP – Reply on the Same Interface as Incoming

dnatiproute2iptablesopenvpn

A server has 3 ifaces, 2 internal (eth1/2) in different public networks, one external (eth0).

There is a service (openvpn) that can't bind to some IPs/ifaces, only to all or one, but I need it to accept connections (UDP) on internal ifaces only. The default gw is via the external one.

I have a working setup with 2 instances of the service, bound to the IPs of each internal iface and routing set with iproute2 (ip route add xxx table x, ip rule add from <IP> table x).

Is it possible to DNAT the incoming connection on the second internal iface (eth2) to the IP of the first internal iface (eth1) and make it respond via the same interface (eth2)? In this case it wouldn't be necessary to run the second instance of the service and maintain 2 identical configs with the only difference of the IP to listen on.

The problem is that if I change (with DNAT) the destination IP of the incoming connection on eth2 to the IP of the eth1, then the ip rule based on from <IP> won't work. Or, better to say, it will make the service reply via eth1, not eth2, using the default gw of the eth1.

Is it possible to efficiently set-mark for all outgoing packets of the DNATed "session" (UDP), so I could use fwmark in ip rule? Any other solution for the main problem?

Best Answer

Found a solution. This solution should work for any linux service that can't listen on specific interfaces, but only on all (0.0.0.0) or one particular, like MySQL, OpenVPN and many others. So we make the service listen on one iface and add netfilter/iproute2 rules to redirect all requests for the same protocol and port on another iface to our service on the first iface.

The "session" (despite being UDP in the OpenVPN case) is actually maintained by netfilter, and there is a module conntrack that permits to reference packets from a specific session. In this case I added a rule for OUTPUT in mangle table to mark all packets from the DNATed sessions with a mark. And then I use this mark to route the packets.


So, the commands are:

Define the variables

iface_int2=eth2         # the second internal iface
ip_int2=xx.xx.xx.xx     # the IP of the second internal iface
proto=udp               # the protocol of the connection
service_port=1194       # the incoming service port
ip_int1=yy.yy.yy.yy     # the IP of the first internal iface
ip_gw2=xx.xx.xx.1       # the IP of the default gateway for the second internal iface



This command instructs netfilter to overwrite the destination IP of the incoming connections on our second iface.

iptables -t nat -A PREROUTING -i $iface_int2 -d $ip_int2 -p $proto --dport \
$service_port -j DNAT --to $ip_int1



This command instructs netfilter to set-mark for the outgoing packets (the reply of the service) of the overwritten (DNATed) incoming connection. --ctorigdst is the original (pre-DNATed) destination IP of the incoming connection

iptables -t mangle -A OUTPUT -p $proto --sport $service_port -m conntrack \
--ctstate DNAT --ctorigdst $ip_int2 -j MARK --set-mark 0x75



This command instructs iproute2 to route set-marked packets via the route definitions of the table 100. Prio is necessary to set the highest priority for this rule, as it's very specific and won't interfere with other rules. If prio is not specified, the routing rules for the first internal iface may get higher priority.

ip rule add prio 10 fwmark 0x75 table 100



This command adds a default gateway to the table 100

ip route add default via $ip_gw2 table 100



For all this to work, it's necessary to lessen the grip of the return path filter on the second internal iface.

# rp_filter - INTEGER
#   0 - No source validation.
#   1 - Strict mode as defined in RFC3704 Strict Reverse Path
#       Each incoming packet is tested against the FIB and if the interface
#       is not the best reverse path the packet check will fail.
#       By default failed packets are discarded.
#   2 - Loose mode as defined in RFC3704 Loose Reverse Path
#       Each incoming packet's source address is also tested against the FIB
#       and if the source address is not reachable via any interface
#       the packet check will fail.

echo 2 > /proc/sys/net/ipv4/conf/$iface_int2/rp_filter
# -OR-
sysctl -w "net.ipv4.conf.$iface_int2.rp_filter=2"
# -OR-
echo "net.ipv4.conf.$iface_int2.rp_filter=2" >> /etc/sysctl.conf
sysctl -p /etc/sysctl.conf