Linux – How broken is routing strategy that causes a martian packet (so far only) during tracepath

linuxrouting

I believe I've achieved a table that routes packets from and to eth1/192.168.3.x through 192.168.3.1, and packets from and to eth0/192.168.1.x through 192.168.1.1 (helpful source).

Question: when doing tracepath from 192.168.3.20 (from within vserver), I'm getting kernel: [318535.927489] martian source 192.168.3.20 from 212.47.223.33, on dev eth0 at or near the target IP, while intermediary hops go without (log below).

I don't understand why this packet is arriving on eth0, instead of eth1, even after reading this:

Note that you may see packets from non-routable IP addresses when running the traceroute or tracepath commands. While packets cannot be routed to these routers, packets sent between 2 routers only need to know the address of the next hop within the local networks, which could be a non-routable address.

Can someone explain that paragraph in human language? Based on short initial trials so far, everything else seems to work without causing martians. Is this contained to the nature of tracepath operation or do I have some other bigger routing problem that will cause work traffic breakage?

Side note: is it possible to inspect martian packet with tcpdump or wireshark or anything of the sort? I'm have not been able to get it to show up on my own.

vserver-20 / # tracepath -n 212.47.223.33
 1:  192.168.3.2                                           0.064ms pmtu 1500
 1:  192.168.3.1                                           1.076ms
 1:  192.168.3.1                                           1.259ms
 2:  90.191.8.2                                            1.908ms
 3:  90.190.134.194                                        2.595ms
 4:  194.126.123.94                                        2.136ms asymm  5
 5:  195.250.170.22                                        2.266ms asymm  6
 6:  212.47.201.86                                         2.390ms asymm  7
 7:  no reply
 8:  no reply
 9:  no reply
^C

Host routing:

$ sudo ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
2: sit0: <NOARP> mtu 1480 qdisc noop state DOWN 
    link/sit 0.0.0.0 brd 0.0.0.0
3: eth0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:24:1d:de:b3:5d brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.2/24 scope global eth0
4: eth1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:46:46:a3:6a brd ff:ff:ff:ff:ff:ff
    inet 192.168.3.2/27 scope global eth1
    inet 192.168.3.20/27 brd 192.168.3.31 scope global secondary eth1  # linux-vserver instance

$ sudo ip route
default via 192.168.1.1 dev eth0  metric 3 
unreachable 127.0.0.0/8  scope host 
192.168.1.0/24 dev eth0  proto kernel  scope link  src 192.168.1.2 
192.168.3.0/27 dev eth1  proto kernel  scope link  src 192.168.3.2

$ sudo ip rule
0:      from all lookup local 
32764:  from all to 192.168.3.0/27 lookup dmz 
32765:  from 192.168.3.0/27 lookup dmz 
32766:  from all lookup main 
32767:  from all lookup default

$ sudo ip route show table dmz
default via 192.168.3.1 dev eth1  metric 4 
192.168.3.0/27 dev eth1  scope link  metric 4

Gateway routing

# ip route
10.24.0.2 dev tun0  proto kernel  scope link  src 10.24.0.1 
10.24.0.0/24 via 10.24.0.2 dev tun0 
192.168.3.0/24 dev br-dmz  proto kernel  scope link  src 192.168.3.1 
192.168.1.0/24 dev br-lan  proto kernel  scope link  src 192.168.1.1 
$ISP_NET/23 dev eth0.1  proto kernel  scope link  src $WAN_IP 
default via $ISP_GW dev eth0.1

Additional background

Options for non-virtualized network interface isolation?

Best Answer

If you receive the martian packet, wireshark should be able to show it.

I also see you've disabled loopback by setting an unreachable route for 127.0.0.0/8. This isn't standards-compliant, and probably isn't that useful to do, but I doubt it has much to do with this problem.

The documentation paragraph simply means that you're likely to see RFC1918 addresses or other unreachable things in the traceroute since these addresses can be used between routers in many cases (eg. within one AS), but will be the address the router gives when the packet exceeds its TTL there. It doesn't mean you should expect martians. I also doubt it has anything to do with this particular packet.

The martian packet may have nothing to do with the traceroute. However, it might. It's often caused by a gateway not doing source nat when it ought to be, but it's also possible that you have a broken NAT rule somewhere translating the destination address of packets outbound from eth1 toward the IP of eth0. This seems most likely given the source of the packet. It also might mean that you're forgetting to do source NAT on outbound packets of yours at your gateway.

You should run a wireshark capture on eth1 and eth0 both, and try and find the packet in eth0 and see if you can correlate it with one from eth1. Also check your NAT rules.

Related Topic