Ubuntu – Traffic via static route on router does not find its way back

linux-networkingopenstackstatic-routessubnetUbuntu

this one bugs me for a couple of days now and I cannot find a solution.
Sorry for the strange title. This isn't my main field of work and I currently cannot think of a better title.

Some facts:

  • I'm a user on a shared openstack environment
  • I cannot see or change the configuration of the underlaying openstack setup
  • The VMs are configured via cloud-init which installs python-minimal, creates a user and does an apt-get dist-upgrade. Apart from that they are configured via DHCP with a static IP.
  • I configured no iptables rules on the nodes.

So, let me describe the setup:

I created a network+subnet (10.0.30.10/24). The network is attached to a router which holds two static routes. I also created two VMs (both ubuntu 16.04.2 LTS) which got their "main" IP (node0: 10.0.30.10/24 and node1: 10.0.30.11/24) and also a second IP in a different subnet (node0: 10.10.10.2/24 and node1: 10.10.20.2/24).

I also configured two static routes on the router which forward everything for 10.10.10.0/24 to node0 and everything for 10.10.20.0/24 to node1.

+----------------------------------------+
|  test-router                           |
|  IP: 10.0.30.1/24                      |
|                                        |
|  Static routes:                        |
|  - destination_cidr = "10.10.10.0/24"  |
|    next_hop         = "10.0.30.10"     |
|  - destination_cidr = "10.10.20.0/24"  |
|    next_hop         = "10.0.30.11"     |
+----------------------------------------+
        |
        |
  +------------------------+
  |  test-network          |
  |  Subnet: 10.0.30.0/24  |
  |  Router: 10.0.30.1     |
  +------------------------+
        |
        |
        |       +---------------------+
        |       |  node0              |
        +-------+  IP: 10.0.30.10/24  |
        |       |      10.10.10.2/24  |
        |       +---------------------+
        |
        |       +---------------------+
        |       |  node1              |
        +-------+  IP: 10.0.30.11/24  |
                |      10.10.20.2/24  |
                +---------------------+

After both VMs are bootet I can observe the following:

Node0

$ route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         10.0.30.1       0.0.0.0         UG    0      0        0 ens3
10.0.30.0       *               255.255.255.0   U     0      0        0 ens3
169.254.169.254 10.0.30.100     255.255.255.255 UGH   0      0        0 ens3
$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc pfifo_fast state UP group default qlen 1000
    link/ether fa:16:3e:31:67:52 brd ff:ff:ff:ff:ff:ff
    inet 10.0.30.10/24 brd 10.0.30.255 scope global ens3
       valid_lft forever preferred_lft forever
    inet 10.10.10.2/24 scope global ens3
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe31:6752/64 scope link
       valid_lft forever preferred_lft forever
$ ping -c10 10.10.20.2
PING 10.10.20.2 (10.10.20.2) 56(84) bytes of data.
From 10.0.30.1: icmp_seq=2 Redirect Host(New nexthop: 10.0.30.11)
From 10.0.30.1: icmp_seq=3 Redirect Host(New nexthop: 10.0.30.11)

--- 10.10.20.2 ping statistics ---
10 packets transmitted, 0 received, 100% packet loss, time 8999ms

$ route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         10.0.30.1       0.0.0.0         UG    0      0        0 ens3
10.0.30.0       *               255.255.255.0   U     0      0        0 ens3
10.10.10.0      *               255.255.255.0   U     0      0        0 ens3
169.254.169.254 10.0.30.100     255.255.255.255 UGH   0      0        0 ens3

Meanwhile on node1

# tcpdump icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
09:25:55.451876 IP 10.0.30.10 > 10.10.20.2: ICMP echo request, id 1271, seq 1, length 64
09:25:55.451928 IP 10.10.20.2 > 10.0.30.10: ICMP echo reply, id 1271, seq 1, length 64
09:25:56.451467 IP 10.0.30.10 > 10.10.20.2: ICMP echo request, id 1271, seq 2, length 64
09:25:56.451503 IP 10.10.20.2 > 10.0.30.10: ICMP echo reply, id 1271, seq 2, length 64
09:25:57.451185 IP 10.0.30.10 > 10.10.20.2: ICMP echo request, id 1271, seq 3, length 64
09:25:57.451218 IP 10.10.20.2 > 10.0.30.10: ICMP echo reply, id 1271, seq 3, length 64
[..]
09:26:03.450910 IP 10.0.30.10 > 10.10.20.2: ICMP echo request, id 1271, seq 9, length 64
09:26:03.450943 IP 10.10.20.2 > 10.0.30.10: ICMP echo reply, id 1271, seq 9, length 64
09:26:04.450988 IP 10.0.30.10 > 10.10.20.2: ICMP echo request, id 1271, seq 10, length 64
09:26:04.451022 IP 10.10.20.2 > 10.0.30.10: ICMP echo reply, id 1271, seq 10, length 64

So, my conclusion is: node1 receives the traffic but the reply doesn't make its way to node0.

The same happens if I start a webserver on node1 and try to curl it via the statically routed IP. I see traffic coming in on node1 but the response never makes it to node0.

On the other hand: arping from node0 to node1 works:

# arping -c3 -i ens3 10.10.20.2
ARPING 10.10.20.2
42 bytes from fa:16:3e:a9:b4:bc (10.10.20.2): index=0 time=7.933 msec
42 bytes from fa:16:3e:a9:b4:bc (10.10.20.2): index=1 time=2.797 msec
42 bytes from fa:16:3e:a9:b4:bc (10.10.20.2): index=2 time=9.703 msec

--- 10.10.20.2 statistics ---
3 packets transmitted, 3 packets received,   0% unanswered (0 extra)
rtt min/avg/max/std-dev = 2.797/6.811/9.703/2.929 ms

If I use the "main" IP, everything works fine.

Things I tried (on both nodes):

  • Setting /proc/sys/net/ipv4/conf/ens3/rp_filter to 2 and 0 (because I'm desperate). Nothing changed.
  • Setting /proc/sys/net/ipv4/ip_forward to 1. Nothing changed.
  • Setting /proc/sys/net/ipv4/conf/ens3/log_martians to 1 on both nodes. No output via journalctl -f whatsoever.

EDIT: There is output on node0 if I ping node1 via the static IP:

May 03 11:16:01 node0 kernel: IPv4: Redirect from 10.0.30.1 on ens3 about 10.0.30.11 ignored
                                Advised path = 10.0.30.10 -> 10.10.20.2

And since I'm running out of ideas, I need your help. Thanks for taking the time looking into my problem!

Best Answer

Challenges:

You have only one broadcast domain (think physical / layer 2 network) that matters, and on that broadcast domain, you have three IP (logical) networks:

  • 10.0.30.0/24 - A
  • 10.10.10.0/24 - B
  • 10.10.20.0/24 - C

Now, you have three devices also, each on a subset of the logical networks:

  • router - A only
  • node0 - A and B
  • node1 - A and C

To make things fun, you've told router that node0 is in charge of network B, and that node1 is in charge of network C, but you didn't tell node0 that node1 was in charge of C, nor node1 that node0 was in charge of B.

This is a recipe for the kind of excitement that you are experiencing.

When router gets a message from node0 destined for an IP on network C, its response is: "Silly node0, you're going the wrong direction; you should know that you need to go to node1 that you also share a network with to get there":

node0 kernel: IPv4: Redirect from 10.0.30.1 on ens3 about 10.0.30.11 ignored
                            Advised path = 10.0.30.10 -> 10.10.20.2

If you are playing with subnets and routing for fun, that's great. You've found a less-optimal approach, but you can keep playing.

If you are trying to accomplish something specific by having the separate networks, you probably want to configure the router to be directly connected to each separate network (A-C), and you probably want each network to be a separate broadcast domain.

If you just want the computers to be able to talk to each other with the IP addresses configured, you can:

  • Add 10.10.10.3/24 to node1, and
  • Add 10.10.20.3/24 to node0

As a general rule, for any network used by routers to communicate with each other (and when you made node0 and node1 responsible for their own networks (B and C), you made them routers), you almost certainly want to make sure that all of those routers are fully informed about the correct route for all neighboring networks. Routing protocols can work for this, but this sample is small enough to do manually.

I hope this is helpful for you / others, despite being a bit out of date.

Related Topic