I'm struggling with this problem for two days.
Assumptions:
- Docker network (and interface) named docknet type bridge subnet 172.18.0.0/16
- Two interfaces eth0 (Gateway IP: 192.168.1.1, Interface Static IP: 192.168.1.100) and eth1 (Gateway IP:192.168.2.1, Interface Static IP: 192.168.2.100)
- Default routing goes through eth0
What I want:
- Outgoing traffic from container attached to docknet must go to eth1
What I tried:
- Default iptable rule created by docker left untouched:
-A POSTROUTING -s 172.18.0.0/16 ! -o docknet -j MASQUERADE
- My rules:
iptables -t mangle -I PREROUTING -s 172.18.0.0/16 -j MARK --set-mark 1
ip rule add from all fwmark 1 table 2
Where table 2 is:
default via 192.168.2.1 dev eth1 proto static
With this setup when I try to ping 8.8.8.8 from a container (172.18.0.2) attached to docknet the following happens:
- 172.18.0.2 gets translated to 192.168.2.1
- the packet goes through eth1
- the packet returns to eth1 with src addr 8.8.8.8 and dst 192.168.2.1
from here a reverse translation from 192.168.2.1 to 172.168.0.2 should happen but running tcpdump -i any host 8.8.8.8
there is not trace about this translation
I checked out also conntrack -L and this is the result:
icmp 1 29 src=172.18.0.2 dst=8.8.8.8 type=8 code=0 id=9 src=8.8.8.8 dst=192.168.2.1 type=0 code=0 id=9 mark=0 use=1
Useful info:
- eth1 is actually a 4G usb dongle
- ip forwarding is active
curl --interface eth1 ipinfo.io
works as expected
EDIT:
output from ip -d link show eth1
eth1: mtu 1500 qdisc fq_codel state UNKNOWN mode DEFAULT group default qlen 1000
link/ether 00:b0:d6:00:00:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
Best Answer
I will also assume
rp_filter
is activated, and is causing troubles. It doesn't behave as expected in presence of a mark. Some references are in this Q/A: Advanced routing with firewall marks andrp_filter
.So while there's a mark set for outgoing packets which selectstable 2
, no such mark exists for incoming packets. So those packets are considered to not be usingtable 2
and are dropped by the routing stack of the kernel by reverse path forwarding filterrp_filter
, because those incoming packets have no reverse outgoing route when looking in themain
table.The fix should be:ip rule add iif eth1 table 2
But becauserp_filter
doesn't behave as expected, a second fix must be added in addition: setrp_filter
in loose mode:sysctl -w net.ipv4.conf.eth1.rp_filter=2
Now, the part I don't have an explanation for: it appears the host doesn't find the 172.18.0.0/16 entry when looking up table 2 and container's outgoing packets are dropped on the host. It doesn't have this problem about not finding 192.168.2.0/24 in table 2 before its default route. So, while not knowing exactly why (it works for 192.168.2.0/24), the final fix is to duplicate from the main table the missing route:ip route add table 2 172.18.0.0/16 dev docknet src 172.18.0.1
I usually duplicate all of them and don't think about it anymore. Now the ping from container should be working and going through eth1.
UPDATE:
Actually there's no need to involve
iptables
at all in this case:ip rule
can do it on its own, and everything behaves better without a mark, because table 2 is looked up when needed while with the mark it wouldn't always be (eg: not needingiif eth1
anymore here). So here's a simplier answer. This supersedes OP's settings and previous answer (so don't add the mangle rule):This makes the container use eth1, without even having to change rp_filter.
Now for this to also work from the host in my test, rp_filter must be loosened again (and of course oif must be used):
Also, contrary to OP, in my tests, to be able to use
ping -I eth1 8.8.8.8
or for examplecurl --interface eth1 8.8.8.8
from the "host" I also had to do this in addition to the previous commands:Which is for locally generated packets going out through eth1. Without it, when forcing interface eth1, host is doing direct ARP requests for 8.8.8.8 (I don't have a good explaination for this, except that's because routes are missing) which won't work unless the 4g card is doing proxy ARP.
BONUS: mockup reproducer script
While knowing what to look for (rp_filter, missing routes, missing rules...), it's been mostly a trial and error to find a working solution. I made a script to reproduce a whole mockup internet with multihomed setup, including two internet providers and google's 8.8.8.8 IP. Using the script below I get those results from (real) host:
Script I made to create the mockup internet network parts (I ran out of test nets, so used ip address
peer
syntax for "LAN-less" address+routing at the end):