Linux is sending ARP requests to hosts in other subnets

arpicmplinuxubuntu-14.04vlan

Setup

Host B <--> Router <--> Host A
  • Host A: IP = 192.168.1.10, Net = 192.168.1.0/24, VLAN = 1, Default GW = 192.168.1.1 (Router)
  • Host B: IP = 192.168.2.10, Net = 192.168.2.0/24, VLAN = 20, Default GW = 192.168.2.1 (Router)
  • Router: IP = 192.168.1.1, 192.168.2.1, VLAN = 1, 20

All devices are connected to a switch with these VLANs configured.

Ping-Test

Now, if I try to ping Host A from Host B, the following occurs:
Host B makes an ARP request to find out the MAC-address of the router and sends the Ping request to the router. The router makes also an ARP request to find out the MAC-address of the destination Host A and forwards the Ping request to Host A. That's ok and that works.

ARP requests for another subnet??

Now the strange part: Host A, of course, tries to answer the Ping, but(!) it doesn't make an ARP request to find out the MAC-address of the router to send it the Ping-Reply to forward it to Host B. Instead of that it sends an ARP request asking for the MAC-address of Host B directly. Of course, that doesn't work, there will be no answer on the local subnet, because the broadcast domain is restricted to the VLAN 1.

ARP cache on Host A (192.168.1.10) looks like this:

# arp -an
? (192.168.1.1) at 16:bc:aa:f2:bc:44 [ether] on eth0
? (192.168.2.10) at <incomplete> on eth0

When I try to delete the weird ARP resolution attempt, I get this message and the failed ARP attempt is still in cache:

# arp -d 192.168.2.10
SIOCDARP(dontpub): Network is unreachable

ICMP-Redirects from router

So, no (bidirectional) communication between Host A and B is possible. And instead of Ping-Replies, Host B, gets an ICMP-Redirect-Request from the router: Host B should send packages direclty to Host A.

My questions

  1. What makes Host B trying to send an answer by ARP resolving a host of another subnet? Why is it the Ping-Reply not sent to the router?
  2. Any idea what role the ICMP-Redirect plays?

Appendix

Host A

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.1.1     0.0.0.0         UG    0      0        0 eth0
192.168.1.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0

# ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether ab:cd:a9:9a:cc:dc brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.10/24 brd 192.168.1.255 scope global eth0
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether ab:cd:a9:9a:cc:dd brd ff:ff:ff:ff:ff:ff

# ip r s
default via 192.168.1.1 dev eth0
192.168.1.0/24 dev eth0  proto kernel  scope link  src 192.168.1.10

Host B

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.2.1     0.0.0.0         UG    0      0        0 eth0
192.168.2.0     0.0.0.0         255.255.255.0   U     1      0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     1000   0        0 eth0

# ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 40:7d:7a:a3:f5:dd brd ff:ff:ff:ff:ff:ff
    inet 192.168.2.10/24 brd 192.168.2.255 scope global eth0
3: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether 47:5e:33:a6:31:5e brd ff:ff:ff:ff:ff:ff

Router

Routing table:

Destination-IP   Subnet mask      Default gateway   Hop count     Interface
<public-net>     255.255.255.224  *                 0             eth2   
<public-net>     255.255.255.224  *                 0             eth1   
192.168.1.0      255.255.255.0    *                 0             eth0   
192.168.2.0      255.255.255.0    *                 0             eth0   
default          0.0.0.0          <public-router>   15            eth1   
default          0.0.0.0          <public-router>   40            eth2   
default          0.0.0.0          <public-router>   40            eth1

public-net …… Address of public subnet (internet-uplink)

public-router … Address of uplink-router

Router is a Cisco RV320 with web interface only, that's all I can get. PS: It's a load balancing dual uplink setup, but that shouldn't make a difference for the ARP problem.

Best Answer

The routing table on the router looks incorrect. It looks as if you are running both VLAN untagged from the router.

I don't know how the switch manages to deliver packets from the router to both A and B, when the router apparently sends all of the packets to the switch with no indication of which VLAN they belong to. The switch I am using wouldn't be able to do that. But perhaps you are using a brand of switch which can somehow correctly guess which VLAN to send the packets to.

However from the routers point of view A and B are on the same Ethernet segment, which means the router is expected to instruct A and B to communicate directly without involving the router. And that is where communication breaks down.

The routing table entries looking like this:

192.168.1.0      255.255.255.0    *                 0             eth0   
192.168.2.0      255.255.255.0    *                 0             eth0   

Should in fact have been looking like this:

192.168.1.0      255.255.255.0    *                 0             eth0.1     
192.168.2.0      255.255.255.0    *                 0             eth0.20    

The virtual interfaces eth0.1 and eth0.20 can be created with the commands:

vconfig add eth0 1
vconfig add eth0 20