Iptables NAT – Why NAT Fails in Network Namespace Transparent Proxy

iptablesmasqueradepolicy-routingtransparent-proxy

I'm trying to setup transparent proxying networks on my host.

Real Client and Proxy targets are containters but in this experiment
I use netns (network namespace) separated envinroment.

To redirect client traffic to proxy transparently, I use policy routing.

 Client (C)         Proxy (P)
 10.10.1.1/24      10.10.2.1/24
     veth0             veth0
      |                 |
   veth pair         veth pair
      |                 |
  -----------(HOST)--------------
 client-veth0       proxy-veth0
 10.10.1.2/24      10.10.2.2/24
      |                 |            172.16.202.30
      +-----------------+-------------- enp4s0 ---- INTERNET

# Policy Routing on Host
# [Client->Proxy]
# ip rule:  from 10.10.1.0/24 iif client-veth0 lookup 100
# ip route: (100) default via 10.10.2.1 dev proxy-veth0
# [Proxy->Internet]
# ip route: (master) default via 172.16.202.1 dev enp4s0 proto static metric 100
# iptables: -t nat -A POSTROUTING -s 10.10.1.1/32 -o enp4s0 -j MASQUERADE
# [Internet->Proxy]
# ip rule:  from all to 10.10.1.0/24 iif enp4s0 lookup 100
# ip route: (100) default via 10.10.2.1 dev proxy-veth0
# [Proxy->Client]
# ip rule:  from all to 10.10.1.0/24 iif proxy-veth0 lookup 101
# ip route: (101) default via 10.10.1.1 dev client-veth0

Problem is, When I ping 8.8.8.8 from Client, within client netns, source ip masquerading does not happen.
iptables masquerade rule does not match and defaults to ACCEPT .
I expect that tcpdump on enp4s0 shows 172.16.202.30 --> 8.8.8.8, but it shows 10.10.1.1 --> 8.8.8.8, without source IP masquerading.

I recorded pcap on internet line to clarify SNAT does not occur. client_to_goolge is recorded from separate machine outside enp4s0:

$ tcpdump -r client_to_google -n
reading from file client_to_google, link-type EN10MB (Ethernet)
23:35:40.852257 IP 10.10.1.1 > 8.8.8.8: ICMP echo request, id 14867, seq 1, length 64
23:35:41.865269 IP 10.10.1.1 > 8.8.8.8: ICMP echo request, id 14867, seq 2, length 64

When I checked on iptables mangle table, packets flows by given policy:

  PREROUTING: client-veth0, 10.10.1.1 --> 8.8.8.8
  POSTROUTING: proxy-veth0, 10.10.1.1 --> 8.8.8.8
  PREROUTING: proxy-veth0, 10.10.1.1 --> 8.8.8.8
  POSTROUTING: enp4s0, 10.10.1.1 --> 8.8.8.8

However, when I change masquerade rule on proxy-veth0 out interface, like this iptables: -t nat -A POSTROUTING -s 10.10.10.1/32 -o proxy-veth0 -j MASQUERADE, masquerading happens. That is
10.10.2.2 --> 8.8.8.8 packets are captured.

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination
...
11       0     0 MASQUERADE  all  --  *      enp4s0  10.10.1.1            0.0.0.0/0
12       1    84 MASQUERADE  all  --  *      proxy-veth0  10.10.1.1            0.0.0.0/0

Above table shows that rule #11 enp4s0 output condition did not trigger. Rule #12 was inserted after several test with rule #11. Rule #12 shows that proxy-veth0 output condition did trigger. Are there any differences between enp4s0 master nic and proxy-veth0 virthual interface with iptables?

Any comments will be appreciated deeply,
Thank you.

Best Answer

I have to assume that the transparent proxy is acting as a router, at least for ICMP, so will route back the ICMP echo where it came from (veth0).

Finding the problem

When reproducing your setup and witnessing your problem, I added a TRACE on the host using iptables (legacy, which might have slight differences with iptables-nft's version) like this (I also forced the creation of the filter table (iptables -S) to have it in the traces):

iptables -t raw -A PREROUTING -j TRACE

And a single ping shows in kernel logs (hint, if host isn't the actual initial host: sysctl -w net.netfilter.nf_log_all_netns=1):

TRACE: raw:PREROUTING:policy:2 IN=client-veth0 OUT= MAC=66:f2:08:79:d0:df:be:1e:05:c1:c1:4b:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=7200 DF PROTO=ICMP TYPE=8 CODE=0 ID=3508 SEQ=1 
TRACE: nat:PREROUTING:policy:1 IN=client-veth0 OUT= MAC=66:f2:08:79:d0:df:be:1e:05:c1:c1:4b:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=7200 DF PROTO=ICMP TYPE=8 CODE=0 ID=3508 SEQ=1 
TRACE: filter:FORWARD:policy:1 IN=client-veth0 OUT=proxy-veth0 MAC=66:f2:08:79:d0:df:be:1e:05:c1:c1:4b:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=7200 DF PROTO=ICMP TYPE=8 CODE=0 ID=3508 SEQ=1 
TRACE: nat:POSTROUTING:policy:2 IN=client-veth0 OUT=proxy-veth0 MAC=66:f2:08:79:d0:df:be:1e:05:c1:c1:4b:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=7200 DF PROTO=ICMP TYPE=8 CODE=0 ID=3508 SEQ=1 
TRACE: raw:PREROUTING:policy:2 IN=proxy-veth0 OUT= MAC=16:c9:3c:d4:ad:8c:8a:84:06:5d:88:e2:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=7200 DF PROTO=ICMP TYPE=8 CODE=0 ID=3508 SEQ=1 
TRACE: filter:FORWARD:policy:1 IN=proxy-veth0 OUT=enp4s0 MAC=16:c9:3c:d4:ad:8c:8a:84:06:5d:88:e2:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=61 ID=7200 DF PROTO=ICMP TYPE=8 CODE=0 ID=3508 SEQ=1 

At the same time, having conntrack -E running on the host shows the matching:

# conntrack -E
    [NEW] icmp     1 30 src=10.10.1.1 dst=8.8.8.8 type=8 code=0 id=3508 [UNREPLIED] src=8.8.8.8 dst=10.10.1.1 type=0 code=0 id=3508

What happened:

  • conntrack (which handles NAT) doesn't care about routes (eg: there's no interface in the conntrack database), only about addresses,
  • the nat table will only see packets in NEW states,
  • the time when conntrack added a NEW entry in its database was when the packet was routed from client-veth0 to proxy-veth0: not matching the POSTROUTING rule,
  • the second round when routing from proxy-veth0 to enp4s0 the packet matched an entry in conntrack and the nat table was not called again,
  • packet leaves to Internet non-NATed.

Since this conntrack's limitation hindered some use cases in the past, like yours, an additional feature was added:

conntrack zones

A zone is simply a numerical identifier associated with a network device that is incorporated into the various hashes and used to distinguish entries in addition to the connection tuples.

[...]

This is mainly useful when connecting multiple private networks using the same addresses (which unfortunately happens occasionally) to pass the packets through a set of veth devices and SNAT each network to a unique address, after which they can pass through the "main" zone and be handled like regular non-clashing packets and/or have NAT applied a second time based f.i. on the outgoing interface.

It allows to sort-of duplicate the conntrack facility, including NAT handling, but has to be done manually and match the problem: here the routing topology.

So here the client <-> proxy traffic, in conntrack's point of view, must be split from other traffic.

I would have preferred to also split the proxy <-> Internet traffic from the generic host traffic, but this is too difficult, because the raw table, where zones must be assigned to a packet, sees only the non-de-NATed traffic, so Internet replies will all arrive with destination 172.16.202.30). Anyway There's no duplicated flow here between both like with the client <-> proxy flow, so that's not really needed.

  • zone 0 (0 means no special zone): generic host traffic along with proxy <-> Internet traffic.

    Nothing special to do, this is the default.

  • zone 1: client <-> proxy traffic. The CT --zone target is used. The value here is chosen arbitrarily and not needed anywhere else for this case.

    iptables -t raw -A PREROUTING -i client-veth0 -j CT --zone 1
    iptables -t raw -A PREROUTING -i proxy-veth0 -d 10.10.1.0/24 -j CT --zone 1
    

The correct results (I merged both tools' outputs) are now:

TRACE: raw:PREROUTING:rule:2 IN=client-veth0 OUT= MAC=4e:e7:2f:3f:a3:6c:4a:b9:40:66:60:32:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1 
TRACE: raw:PREROUTING:policy:4 IN=client-veth0 OUT= MAC=4e:e7:2f:3f:a3:6c:4a:b9:40:66:60:32:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1 
TRACE: nat:PREROUTING:policy:1 IN=client-veth0 OUT= MAC=4e:e7:2f:3f:a3:6c:4a:b9:40:66:60:32:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1 
TRACE: filter:FORWARD:policy:1 IN=client-veth0 OUT=proxy-veth0 MAC=4e:e7:2f:3f:a3:6c:4a:b9:40:66:60:32:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1 
TRACE: nat:POSTROUTING:policy:2 IN=client-veth0 OUT=proxy-veth0 MAC=4e:e7:2f:3f:a3:6c:4a:b9:40:66:60:32:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1 
    [NEW] icmp     1 30 src=10.10.1.1 dst=8.8.8.8 type=8 code=0 id=4079 [UNREPLIED] src=8.8.8.8 dst=10.10.1.1 type=0 code=0 id=4079 zone=1
TRACE: raw:PREROUTING:policy:4 IN=proxy-veth0 OUT= MAC=86:c8:4b:5f:16:fc:ba:76:80:0f:20:7d:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1 
TRACE: nat:PREROUTING:policy:1 IN=proxy-veth0 OUT= MAC=86:c8:4b:5f:16:fc:ba:76:80:0f:20:7d:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1 
TRACE: filter:FORWARD:policy:1 IN=proxy-veth0 OUT=enp4s0 MAC=86:c8:4b:5f:16:fc:ba:76:80:0f:20:7d:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=61 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1 
TRACE: nat:POSTROUTING:rule:1 IN=proxy-veth0 OUT=enp4s0 MAC=86:c8:4b:5f:16:fc:ba:76:80:0f:20:7d:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=61 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1 
    [NEW] icmp     1 30 src=10.10.1.1 dst=8.8.8.8 type=8 code=0 id=4079 [UNREPLIED] src=8.8.8.8 dst=172.16.202.30 type=0 code=0 id=4079
TRACE: raw:PREROUTING:policy:4 IN=enp4s0 OUT= MAC=5e:e8:0c:bf:96:d9:b2:e7:bc:df:1f:8e:08:00 SRC=8.8.8.8 DST=172.16.202.30 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=12099 PROTO=ICMP TYPE=0 CODE=0 ID=4079 SEQ=1 
TRACE: filter:FORWARD:policy:1 IN=enp4s0 OUT=proxy-veth0 MAC=5e:e8:0c:bf:96:d9:b2:e7:bc:df:1f:8e:08:00 SRC=8.8.8.8 DST=10.10.1.1 LEN=84 TOS=0x00 PREC=0x00 TTL=61 ID=12099 PROTO=ICMP TYPE=0 CODE=0 ID=4079 SEQ=1 
 [UPDATE] icmp     1 30 src=10.10.1.1 dst=8.8.8.8 type=8 code=0 id=4079 src=8.8.8.8 dst=172.16.202.30 type=0 code=0 id=4079
TRACE: raw:PREROUTING:rule:3 IN=proxy-veth0 OUT= MAC=86:c8:4b:5f:16:fc:ba:76:80:0f:20:7d:08:00 SRC=8.8.8.8 DST=10.10.1.1 LEN=84 TOS=0x00 PREC=0x00 TTL=60 ID=12099 PROTO=ICMP TYPE=0 CODE=0 ID=4079 SEQ=1 
TRACE: raw:PREROUTING:policy:4 IN=proxy-veth0 OUT= MAC=86:c8:4b:5f:16:fc:ba:76:80:0f:20:7d:08:00 SRC=8.8.8.8 DST=10.10.1.1 LEN=84 TOS=0x00 PREC=0x00 TTL=60 ID=12099 PROTO=ICMP TYPE=0 CODE=0 ID=4079 SEQ=1 
TRACE: filter:FORWARD:policy:1 IN=proxy-veth0 OUT=client-veth0 MAC=86:c8:4b:5f:16:fc:ba:76:80:0f:20:7d:08:00 SRC=8.8.8.8 DST=10.10.1.1 LEN=84 TOS=0x00 PREC=0x00 TTL=59 ID=12099 PROTO=ICMP TYPE=0 CODE=0 ID=4079 SEQ=1 
 [UPDATE] icmp     1 30 src=10.10.1.1 dst=8.8.8.8 type=8 code=0 id=4079 src=8.8.8.8 dst=10.10.1.1 type=0 code=0 id=4079 zone=1

Here a single first packet from a new flow triggers twice iptables' nat table, the first time with no effect. Actually conntrack considers there are two flows, because the first flow has the additional attribute zone=1.