docker routing – Docker Symmetric/Policy-Based Routing Guide

dockerpolicy-routingrouting

Background

I have a debian server that has 3 network interfaces which are:

  • eno1 (10.0.0.35/24)
  • eno1.10 (10.0.10.65/24)
  • eno1.40 (10.0.40.40/24)

Between those interfaces is a firewall. The multiple routes on the server lead to asymetrical routing which was blocked by firewall as invalid traffic.

Because of that I added some policy-based rules so the destination/source IP address stay the same. I accomplished this by editing my /etc/network/interfaces like this:

# The primary network interface
allow-hotplug eno1
iface eno1 inet dhcp
  post-up ip route add 10.0.0.0/24 dev eno1 table 1
  post-up ip route add default via 10.0.0.1 table 1
  post-up ip rule add from 10.0.0.35/32 table 1 priority 100
  post-up ip route flush cache
  pre-down ip rule del from 10.0.0.35/32 table 1 priority 100
  pre-down ip route flush table 1
  pre-down ip route flush cache

# VLANS
auto eno1.10
iface eno1.10 inet dhcp
  post-up ip route add 10.0.10.0/24 dev eno1.10 table 2
  post-up ip route add default via 10.0.10.1 table 2
  post-up ip rule add from 10.0.10.65/32 table 2 priority 110
  post-up ip route flush cache
  pre-down ip rule del from 10.0.10.65/32 table 2 priority 110
  pre-down ip route flush table 2
  pre-down ip route flush cache

auto eno1.40
iface eno1.40 inet dhcp
  post-up ip route add 10.0.40.0/24 dev eno1.40 table 3
  post-up ip route add default via 10.0.40.1 table 3
  post-up ip rule add from 10.0.40.40/32 table 3 priority 120
  post-up ip route flush cache
  pre-down ip rule del from 10.0.40.40/32 table 3 priority 120
  pre-down ip route flush table 3
  pre-down ip route flush cache

All the services running on the server were now working as they should be.

Additionally I have a docker host running on the server that hosts some containers which are bound to the different interfaces on the server.

Problem

Now the problem is that the rules I created apparently don't apply to traffic coming from the docker containers and I can't access them because the traffic is being blocked as invalid.

What would I need to do here for the docker containers to know which route to use according to the source IP?

Best Answer

The quick solution:

  • Add the routing rules by firewall mark. Packets with a correspond mark will be routed through a separate routing table.
ip rule add fwmark 0x1 lookup 1 pref 10001
ip rule add fwmark 0x2 lookup 2 pref 10002
ip rule add fwmark 0x3 lookup 3 pref 10003
  • The mark of incoming connections depends on an input interface. The connmark target saves a mark value inside a conntrack entry.
iptables -t mangle -A PREROUTING -m conntrack --ctstate NEW -i eno1 -j CONNMARK --set-mark 0x1
iptables -t mangle -A PREROUTING -m conntrack --ctstate NEW -i eno1.10 -j CONNMARK --set-mark 0x2
iptables -t mangle -A PREROUTING -m conntrack --ctstate NEW -i eno1.40 -j CONNMARK --set-mark 0x3
  • Copy the mark value from the conntrack entry to the firewall mark. After this the replied packet will be routing by additional routing rules, those have been added. Use additional -i match or match by source address, otherwise you need add directly connected routes into additional tables.
iptables -t mangle -A PREROUTING -i docker0 -j CONNMARK --restore-mark
  • Also you can use the match by source address instead an input interface.
iptables -t mangle -A PREROUTING --src <container-subnet> -j --restore-mark
  • This solution perfectly works with DNAT.
  • Use the tcpdump and the conntrack tool to troubleshoot issues.
  • Also check the rp_filter. It can drop the packets in some cases. Better set it into the loose mode (sysctl -w net.ipv4.conf.all.rp_filter=2).

Update

After some tests in the lab I've found a perfect rule set. It requires only one mark value and one additional routing rule per uplink. It also handle complex cases, when you use public addresses on several interfaces.

  • For every uplink create an additional routing table and assign a firewall mark.
ip route add <uplink-subnet> dev <uplink-iface> table <uplink-table>
ip route add 0/0 via <uplink-gw> dev <uplink-iface> table <uplink-table>

ip rule add fwmark <uplink-mark> table <uplink-table>
  • For every uplink interface add single rule to mark incoming connections:
iptables -t mangle -A PREROUTING -i <uplink-iface> -m conntrack --ctstate NEW --ctdir ORIGINAL -j CONNMARK --set-mark <uplink-mark>
...
  • Add two rules for all uplinks to mark reply packets:
iptables -t mangle -A PREROUTING -m conntrack ! --ctstate NEW --ctdir REPLY -m connmark ! --mark 0x0 -j CONNMARK --restore-mark

iptables -t mangle -A OUTPUT -m conntrack ! --ctstate NEW --ctdir REPLY -m connmark ! --mark 0x0 -j CONNMARK --restore-mark
Related Topic