Multiple macvlan devices and policy based routing confusion

linux-networkingpacketlosspolicy-routing

I have a server (ubuntu/debian) with two ISP connections. Both of these WAN connections have multiple public IP addresses.

(big pipe)----eth0-->\
                      > server ---eth2--(internal)
(cable pipe)--eth1-->/

On eth0 I have 4 IPs assigned to me that are a part of a broader /24 subnet. 24.xxx.xxx.xxx/24
On eth1 I have 5 IPs assigned to me but here I am the only one on a /29 (the 6th IP is the gateway I hit) 71.xxx.xxx.xxx/29

My goal is to setup source/policy based routing so that VMs/clients on the various internal subnets (there are multiple actual VLANS on eth2) can be routed out to the internet on any specified WAN IP.

Here's what I've done so far.

First I have eth0 and eth1 configured in the interfaces file.

auto eth0
iface eth0 inet static
        address 24.xxx.xxx.66
        netmask 255.255.255.0
        network 24.xxx.xxx.0
        broadcast 24.xxx.xxx.255
        gateway 24.xxx.xxx.1
        dns-nameservers 8.8.8.8
        up /etc/network/rt_scripts/i_eth0

auto eth1
iface eth1 inet static
        address 71.xxx.xxx.107
        netmask 255.255.255.248
        network 71.xxx.xxx.105
        broadcast 71.xxx.xxx.111
        up /etc/network/rt_scripts/i_eth1

Then macvlan devices on the BigPipe

#!/bin/sh

#iface BigPipe67
ip link add mac0 link eth0 address xx:xx:xx:xx:xx:3c type macvlan
ip link set mac0 up
ip address add 24.xxx.xxx.67/24 dev mac0

#iface BigPipe135
ip link add mac1 link eth0 address xx:xx:xx:xx:xx:3d type macvlan
ip link set mac1 up
ip address add 24.xxx.xxx.135/24 dev mac1

#iface BigPipe136
ip link add mac2 link eth0 address xx:xx:xx:xx:xx:3e type macvlan
ip link set mac2 up
ip address add 24.xxx.xxx.136/24 dev mac2

/etc/network/rt_scripts/t_frontdesk
/etc/network/rt_scripts/t_pubwifi
/etc/network/rt_scripts/t_mail1
/etc/network/rt_scripts/t_scansrvc

CBL connection. The missing 5th IP (71.xxx.xxx.106) is a different router sitting in the building.

#!/bin/sh
ip route add xxx.xxx.xxx.xxx/20 via 71.xxx.xxx.105 dev eth1
ip route add xxx.xxx.xxx.xxx/20 via 71.xxx.xxx.105 dev eth1

#iface CBL108
ip link add mac3 link eth1 address xx:xx:xx:xx:xx:c5 type macvlan
ip link set mac3 up
ip address add 71.xxx.xxx.108/29 dev mac3

#iface CBL109
ip link add mac4 link eth1 address xx:xx:xx:xx:xx:c6 type macvlan
ip link set mac4 up
ip address add 71.xxx.xxx.109/29 dev mac4

#iface CBL110
ip link add mac5 link eth1 address xx:xx:xx:xx:xx:c7 type macvlan
ip link set mac5 up
ip address add 71.xxx.xxx.110/29 dev mac5

/etc/network/rt_scripts/t_jenkins4
/etc/network/rt_scripts/t_skynet
/etc/network/rt_scripts/t_lappy386

You'll pry notice I have a couple routes specified on the main table when I setup the macvlan interfaces on eth1. I have a couple other routers on the same cable provider as my main server. They VPN back to the main server while the BigPipe is used for everything else (on the main table).

The "t_" scripts are used to setup the individual rules and tables for the various services/clients that used the IPs setup by the macvlan interfaces.

Simplified, they look a little like this.

#!/bin/sh
ip rule add from 172.23.1.6 table scansrvc
ip route add default via 24.xxx.xxx.1 dev mac0 table scansrvc
ip route add 24.xxx.xxx.0/24 dev mac0 table scansrvc
ip route add 172.23.0.0/20 dev br1 table scansrvc

So putting that all together and as a quick recap, I've got the main server using 8 public IPs (4 on BigPipe and 4 on CBL). One of the BigPipe IPs and one of the CBL IPs are used for VPN services effectively creating a "ghetto internet exchange" if you will. That routing configuration exists on the main table.

Then the remaining 6 IPs are used by various services or clients and those tables are frontdesk, pubwifi, mail1, scansrvc, jenkins4, skynet, and lappy386.

I am masquerading on all public IPs to the various internal subnets.

Here's where I just am dumbfounded… It all works until it doesn't. Meaning, when I startup the server everything gets setup correctly and I am able to see that the routing policies are doing what they're supposed to be doing.

So, on scansrvc, which is a VM on the main server but with an internal ip (172.23.1.6/20)

waffle@scansrvc:~$ dig +short myip.opendns.com @resolver1.opendns.com
24.xxx.xxx.67

However, after a while packets stop making it back to the VM behind the main server. I could see in the iptables firewall stats that they'd leave my network but not make it back.

When it's working and I scan from the outside I can see the service port, but after it dies iptables also doesn't even see the packets make it in.

Also, through my searching I started reading about martian packets. So I turned on the logging of those through sysctl. Wow. I'm logging a ton of martians from the BigPipe but none from the CBL, perhaps because BigPipe I'm not the only one on that subnet?

Here's a snippet

Nov 22 08:59:03 srv3 kernel: [  271.747016] net_ratelimit: 497 callbacks suppressed
Nov 22 08:59:03 srv3 kernel: [  271.747027] IPv4: martian source 24.xxx.xxx.43 from 24.xxx.xxx.1, on dev mac0
Nov 22 08:59:03 srv3 kernel: [  271.747035] ll header: 00000000: ff ff ff ff ff ff cc 4e 24 9c 1d 00 08 06        .......N$.....
Nov 22 08:59:03 srv3 kernel: [  271.747046] IPv4: martian source 24.xxx.xxx.43 from 24.xxx.xxx.1, on dev mac2
Nov 22 08:59:03 srv3 kernel: [  271.747052] ll header: 00000000: ff ff ff ff ff ff cc 4e 24 9c 1d 00 08 06        .......N$.....
Nov 22 08:59:03 srv3 kernel: [  271.747061] IPv4: martian source 24.xxx.xxx.43 from 24.xxx.xxx.1, on dev mac1
Nov 22 08:59:03 srv3 kernel: [  271.747066] ll header: 00000000: ff ff ff ff ff ff cc 4e 24 9c 1d 00 08 06        .......N$.....
Nov 22 08:59:03 srv3 kernel: [  271.796429] IPv4: martian source 24.xxx.xxx.211 from 24.xxx.xxx.1, on dev mac0
Nov 22 08:59:03 srv3 kernel: [  271.796440] ll header: 00000000: ff ff ff ff ff ff cc 4e 24 9c 1d 00 08 06        .......N$.....
Nov 22 08:59:03 srv3 kernel: [  271.796450] IPv4: martian source 24.xxx.xxx.211 from 24.xxx.xxx.1, on dev mac2

From what I understand so far about martians, my hypothesis is that having multiple interfaces on the same subnet could be causing packets not meant for an interface to be sent to that interface… somehow… (I thought since they've got different MAC addresses that would be alleviated)

What would cause this? Why when I freshly boot the system and the VMS will the setup work until all the sudden dies after a while? (Ex. if I leave a ping running to 8.8.8.8 on the scansrvc VM I'll get 100-1000 responses back before it dies) Could this be something with the ARP cache? It's not like I'm reassigning any IPs to different MAC addresses mid-flight.

I'm stuck. I'm going to start to learn some tcpdump skills to try and shed some light on something I'm perhaps missing. If anyone that's better versed in networking setups could point out anything I'm missing it'd be a huge help! 🙂

Best Answer

The error messages have been caused by the validation of the source of the packets (see kernel code).

I can assume, there is only only one route for directly connected overlapsed subnet in the main routing table in your setup. And when you had recieved the packet from directly connected subnet through other interface (not through that, what is in the main routing table), this packet was recognised as martian.

How to troubleshoot:

Lookup the route for this packet source with 'ip route get 24.xxx.xxx.1' command and compare the interface of the route and the interface, through what the packet had arrived, with each other. Likely they are different.

How to solve the issue:

If you're using the PBR with multiple routing tables, add the directly connected route through the corresponded interface into every of these routing tables. Maybe you should rework your PBR rules to avoid the route mismatches.
Check the rp_filter and disable it or better switch it into the loose mode (see sysctl variable)
Discard the macvlan interfaces and use the multiple addresses on interface (this is hardway, but more ideologically right, I think).

Related Solutions

Ubuntu – KVM Ubuntu Guest cannot connect to the internet on bridged networking

I think you are missing a iptable rule for the masquerade

iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -o eth0 -j MASQUERADE

Linux – How to configure dual homed server in order for both network segments to communicate

There are two problems with this setup:

The hosts on LAN1 know nothing about the LAN2 segment. When you ping a host on LAN1 (let's call it host1) from SRV-02, the packet will be routed through SRV-01 and will reach host1. However, host1 will send the reply to it's default gateway (ISP router) as it doesn't have a specific route to LAN2. (The ISP router will either a) also send it to it's default gateway as it also doesn't know about LAN2, or b) drop the packet as it comes from an unknown source not it's local LAN.)
When trying to reach WAN from LAN2, the packets will be routed through SRV-02 to ISP router where two situations are possible:
- The router will not NAT translate the packet as the source of the packet (LAN2) is not it's local LAN (this is the more probable situation), or
- The router will NAT translate the packet and send it to the Internet. However, when the reply comes and the destination is translated back to the LAN2 address, the packet will not be delivered as the ISP router doesn't have a route for that network. The packet will be sent incorrectly to the default gateway (ISP).

These issues could be fixed by adding a static route to LAN2 to ISP router and adding a source NAT configuration for LAN2 on SRV-01. However, that is not possible due to no admin access to the ISP router.

There are two solutions that get around it:

A. Make SRV-01 a full router for LAN1 and LAN2 hosts

Add another network adapter to SRV-01 (making it 3 in total)
Change the topology as follows:

WAN -> ISP router -> LAN1 -> SRV-01 +-> LAN3 (for hosts originally in LAN1)
                                    +-> LAN2 -> SRV-02

Basically, we're making SRV-01 a router for both LAN segments.

This will require moving hosts originally in LAN1 to a new subnet LAN3 - let's say we use 10.0.1.0/24
The network configuration of SRV-01 will need to be changed as follows:

/etc/network/interfaces:

# LAN1 - to ISP router
auto eth0
iface eth0 inet dhcp
# we can even use dhcp as the IP address is not really important
# - there are no more hosts on LAN1 apart from ISP router and SRV-01

# LAN3 - for hosts originally in LAN1
iface eth1
    address 10.0.1.1
    netmask 255.255.255.0

# LAN2
iface eth2
    address 10.0.2.1
    netmask 255.255.255.0

iptables rules to make WAN access work:

iptables -t nat -A POSTROUTING -o eth0 -s 10.0.1.0/24 -j MASQUERADE
iptables -t nat -A POSTROUTING -o eth0 -s 10.0.2.0/24 -j MASQUERADE

Alternatively, if you choose to keep the static IP address on SRV-01 on eth0 the rules could be changed (although MASQUERADE would still work):

iptables -t nat -A POSTROUTING -o eth0 -s 10.0.1.0/24 -j SNAT --to-source 192.168.5.8
iptables -t nat -A POSTROUTING -o eth0 -s 10.0.2.0/24 -j SNAT --to-source 192.168.5.8

DHCP will need to be configured on SRV-01 on eth1 (LAN3, for hosts originally on LAN1), and possibly on eth2 (LAN2) as well if required. (In both cases the gateway will be the local address of eth1 or eth2 respectively, but that goes without saying :)

This will make communication possible between LAN3 and LAN2 (via SRV-01 which is the default gateway for both). WAN access will also work from both LAN3 and LAN2 thanks to the double source NAT.

B. Make SRV-01 a DHCP server for LAN1

This approach is not as clean as above but is slightly simpler. It assumes you are able to disable DHCP on ISP router

Disable DHCP on ISP router
Set up DHCP for LAN1 on SRV-01 and make SRV-01 (192.168.5.8) the default gateway for LAN1
Set up source NAT translation for LAN2 on SRV-01 so that the WAN access works from LAN2:

iptables -t nat -A POSTROUTING -o eth0 -s 10.0.2.0/24 -d 192.168.5.4 -j SNAT --to-source 192.168.5.8
iptables -t nat -A POSTROUTING -o eth0 -s 10.0.2.0/24 ! -d 192.168.5.0/24 -j SNAT --to-source 192.168.5.8

The first line enables SNAT so that LAN2 hosts can access the ISP router itself and the second line disables SNAT for LAN2-LAN1 access.

Again, this approach is not as clean as the one above as there are two routers in the same subnet (SRV-01, ISP router). When I used this approach myself I noticed my second router (SRV-01 in this scenario) would send ICMP redirects to the ISP router as it would see that the client (host on LAN1) and the upstream router (ISP router) are on the same LAN. This might not be desired as network policies implemented on SRV-01 could be circumvented.

Hope that helps.

Best Answer

Related Solutions

Ubuntu – KVM Ubuntu Guest cannot connect to the internet on bridged networking

Linux – How to configure dual homed server in order for both network segments to communicate

Related Topic