Multiple macvlan devices and policy based routing confusion

linux-networkingpacketlosspolicy-routing

I have a server (ubuntu/debian) with two ISP connections. Both of these WAN connections have multiple public IP addresses.

(big pipe)----eth0-->\
                      > server ---eth2--(internal)
(cable pipe)--eth1-->/

On eth0 I have 4 IPs assigned to me that are a part of a broader /24 subnet. 24.xxx.xxx.xxx/24
On eth1 I have 5 IPs assigned to me but here I am the only one on a /29 (the 6th IP is the gateway I hit) 71.xxx.xxx.xxx/29

My goal is to setup source/policy based routing so that VMs/clients on the various internal subnets (there are multiple actual VLANS on eth2) can be routed out to the internet on any specified WAN IP.

Here's what I've done so far.

First I have eth0 and eth1 configured in the interfaces file.

auto eth0
iface eth0 inet static
        address 24.xxx.xxx.66
        netmask 255.255.255.0
        network 24.xxx.xxx.0
        broadcast 24.xxx.xxx.255
        gateway 24.xxx.xxx.1
        dns-nameservers 8.8.8.8
        up /etc/network/rt_scripts/i_eth0

auto eth1
iface eth1 inet static
        address 71.xxx.xxx.107
        netmask 255.255.255.248
        network 71.xxx.xxx.105
        broadcast 71.xxx.xxx.111
        up /etc/network/rt_scripts/i_eth1

Then macvlan devices on the BigPipe

#!/bin/sh

#iface BigPipe67
ip link add mac0 link eth0 address xx:xx:xx:xx:xx:3c type macvlan
ip link set mac0 up
ip address add 24.xxx.xxx.67/24 dev mac0

#iface BigPipe135
ip link add mac1 link eth0 address xx:xx:xx:xx:xx:3d type macvlan
ip link set mac1 up
ip address add 24.xxx.xxx.135/24 dev mac1

#iface BigPipe136
ip link add mac2 link eth0 address xx:xx:xx:xx:xx:3e type macvlan
ip link set mac2 up
ip address add 24.xxx.xxx.136/24 dev mac2

/etc/network/rt_scripts/t_frontdesk
/etc/network/rt_scripts/t_pubwifi
/etc/network/rt_scripts/t_mail1
/etc/network/rt_scripts/t_scansrvc

CBL connection. The missing 5th IP (71.xxx.xxx.106) is a different router sitting in the building.

#!/bin/sh
ip route add xxx.xxx.xxx.xxx/20 via 71.xxx.xxx.105 dev eth1
ip route add xxx.xxx.xxx.xxx/20 via 71.xxx.xxx.105 dev eth1

#iface CBL108
ip link add mac3 link eth1 address xx:xx:xx:xx:xx:c5 type macvlan
ip link set mac3 up
ip address add 71.xxx.xxx.108/29 dev mac3

#iface CBL109
ip link add mac4 link eth1 address xx:xx:xx:xx:xx:c6 type macvlan
ip link set mac4 up
ip address add 71.xxx.xxx.109/29 dev mac4

#iface CBL110
ip link add mac5 link eth1 address xx:xx:xx:xx:xx:c7 type macvlan
ip link set mac5 up
ip address add 71.xxx.xxx.110/29 dev mac5

/etc/network/rt_scripts/t_jenkins4
/etc/network/rt_scripts/t_skynet
/etc/network/rt_scripts/t_lappy386

You'll pry notice I have a couple routes specified on the main table when I setup the macvlan interfaces on eth1. I have a couple other routers on the same cable provider as my main server. They VPN back to the main server while the BigPipe is used for everything else (on the main table).

The "t_" scripts are used to setup the individual rules and tables for the various services/clients that used the IPs setup by the macvlan interfaces.

Simplified, they look a little like this.

#!/bin/sh
ip rule add from 172.23.1.6 table scansrvc
ip route add default via 24.xxx.xxx.1 dev mac0 table scansrvc
ip route add 24.xxx.xxx.0/24 dev mac0 table scansrvc
ip route add 172.23.0.0/20 dev br1 table scansrvc

So putting that all together and as a quick recap, I've got the main server using 8 public IPs (4 on BigPipe and 4 on CBL). One of the BigPipe IPs and one of the CBL IPs are used for VPN services effectively creating a "ghetto internet exchange" if you will. That routing configuration exists on the main table.

Then the remaining 6 IPs are used by various services or clients and those tables are frontdesk, pubwifi, mail1, scansrvc, jenkins4, skynet, and lappy386.

I am masquerading on all public IPs to the various internal subnets.

Here's where I just am dumbfounded… It all works until it doesn't. Meaning, when I startup the server everything gets setup correctly and I am able to see that the routing policies are doing what they're supposed to be doing.

So, on scansrvc, which is a VM on the main server but with an internal ip (172.23.1.6/20)

waffle@scansrvc:~$ dig +short myip.opendns.com @resolver1.opendns.com
24.xxx.xxx.67

However, after a while packets stop making it back to the VM behind the main server. I could see in the iptables firewall stats that they'd leave my network but not make it back.

When it's working and I scan from the outside I can see the service port, but after it dies iptables also doesn't even see the packets make it in.

Also, through my searching I started reading about martian packets. So I turned on the logging of those through sysctl. Wow. I'm logging a ton of martians from the BigPipe but none from the CBL, perhaps because BigPipe I'm not the only one on that subnet?

Here's a snippet

Nov 22 08:59:03 srv3 kernel: [  271.747016] net_ratelimit: 497 callbacks suppressed
Nov 22 08:59:03 srv3 kernel: [  271.747027] IPv4: martian source 24.xxx.xxx.43 from 24.xxx.xxx.1, on dev mac0
Nov 22 08:59:03 srv3 kernel: [  271.747035] ll header: 00000000: ff ff ff ff ff ff cc 4e 24 9c 1d 00 08 06        .......N$.....
Nov 22 08:59:03 srv3 kernel: [  271.747046] IPv4: martian source 24.xxx.xxx.43 from 24.xxx.xxx.1, on dev mac2
Nov 22 08:59:03 srv3 kernel: [  271.747052] ll header: 00000000: ff ff ff ff ff ff cc 4e 24 9c 1d 00 08 06        .......N$.....
Nov 22 08:59:03 srv3 kernel: [  271.747061] IPv4: martian source 24.xxx.xxx.43 from 24.xxx.xxx.1, on dev mac1
Nov 22 08:59:03 srv3 kernel: [  271.747066] ll header: 00000000: ff ff ff ff ff ff cc 4e 24 9c 1d 00 08 06        .......N$.....
Nov 22 08:59:03 srv3 kernel: [  271.796429] IPv4: martian source 24.xxx.xxx.211 from 24.xxx.xxx.1, on dev mac0
Nov 22 08:59:03 srv3 kernel: [  271.796440] ll header: 00000000: ff ff ff ff ff ff cc 4e 24 9c 1d 00 08 06        .......N$.....
Nov 22 08:59:03 srv3 kernel: [  271.796450] IPv4: martian source 24.xxx.xxx.211 from 24.xxx.xxx.1, on dev mac2

From what I understand so far about martians, my hypothesis is that having multiple interfaces on the same subnet could be causing packets not meant for an interface to be sent to that interface… somehow… (I thought since they've got different MAC addresses that would be alleviated)

What would cause this? Why when I freshly boot the system and the VMS will the setup work until all the sudden dies after a while? (Ex. if I leave a ping running to 8.8.8.8 on the scansrvc VM I'll get 100-1000 responses back before it dies) Could this be something with the ARP cache? It's not like I'm reassigning any IPs to different MAC addresses mid-flight.

I'm stuck. I'm going to start to learn some tcpdump skills to try and shed some light on something I'm perhaps missing. If anyone that's better versed in networking setups could point out anything I'm missing it'd be a huge help! 🙂

Best Answer

The error messages have been caused by the validation of the source of the packets (see kernel code).

I can assume, there is only only one route for directly connected overlapsed subnet in the main routing table in your setup. And when you had recieved the packet from directly connected subnet through other interface (not through that, what is in the main routing table), this packet was recognised as martian.

How to troubleshoot:

  • Lookup the route for this packet source with 'ip route get 24.xxx.xxx.1' command and compare the interface of the route and the interface, through what the packet had arrived, with each other. Likely they are different.

How to solve the issue:

  • If you're using the PBR with multiple routing tables, add the directly connected route through the corresponded interface into every of these routing tables. Maybe you should rework your PBR rules to avoid the route mismatches.
  • Check the rp_filter and disable it or better switch it into the loose mode (see sysctl variable)
  • Discard the macvlan interfaces and use the multiple addresses on interface (this is hardway, but more ideologically right, I think).
Related Topic