Linux – ARP response is not present on the other port of linux bridge based on ubuntu 13.10

arpbridgelinuxUbuntuvmware-vsphere

I recently build linux based bridge for packet monitoring purpose, but there is BIG problem.

the environment is,

  1. overall env.
    • target of monitoring is VMs on vSphere.
    • two vSwitches are configured on vSphere host.
    • vSwitch 1 is configured with NIC for outside-bridge communication.
    • vSwitch 2 is configured without NIC for bridge-vm connection.
    • both are configured as "Allow Promiscous mode".
  2. env. of bridge.
    • based on ubuntu 13.10, installed as minimal virtual machine.
    • br0 was configured with eth0(to vS1) and eth1(to vS2)

my problem is, when VM ping to GW, the ARP Request is made and there is a response from GW.
but the response packet is shown only on eth0 and br0.

superhero@vim-firewall:~$ sudo tcpdump -i eth0 -n host 192.168.10.172
tcpdump: WARNING: eth0: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
12:13:45.809949 ARP, Request who-has 192.168.10.1 tell 192.168.10.172, length 46
12:13:45.810060 ARP, Request who-has 192.168.10.1 tell 192.168.10.172, length 46
12:13:45.810742 ARP, Reply 192.168.10.1 is-at 00:00:aa:aa:aa:d9, length 46
...
superhero@vim-firewall:~$ sudo tcpdump -i br0 -n host 192.168.10.172
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br0, link-type EN10MB (Ethernet), capture size 65535 bytes
12:13:51.810928 ARP, Request who-has 192.168.10.1 tell 192.168.10.172, length 46
12:13:51.811031 ARP, Request who-has 192.168.10.1 tell 192.168.10.172, length 46
12:13:51.811579 ARP, Reply 192.168.10.1 is-at 00:aa:aa:aa:aa:d9, length 46
...
superhero@vim-firewall:~$ sudo tcpdump -i eth1 -n host 192.168.10.172
tcpdump: WARNING: eth1: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
12:13:57.812937 ARP, Request who-has 192.168.10.1 tell 192.168.10.172, length 46
12:13:57.813040 ARP, Request who-has 192.168.10.1 tell 192.168.10.172, length 46
...

I need help!

PS. currently, only ARP is problem. if I add a GW's MAC manually, network connection is fine, except access to local subnet ofcause.

Best Answer

Oops! I finally found the working solution!

I spent about 10+ hours spreaded on three days for this problem, but the solution is not clean, is closer to work-around. I need real solution or better work-around for this. please help.

anyway, I found below information from vmware cummunity. (root cause of the problem is related to vmware.)

  1. https://communities.vmware.com/message/1509541#1509541
  2. https://communities.vmware.com/message/2208190#2208190

URL (1) is a comment from original author who has same problem, about the reason of the problem. the problem is caused by vSwitch's behavior. if there is two or more physical NIC attatched, they made duplicated ARP request and the linux bridge receives dup.ed request from outside NIC/port. so it fall in confusion of MAC-port map.

I did test for this (dettach standby NIC from vSwitch) then networking works fine. (with single NIC vSwitch)

another comment, URL number (2) describes an work-around. if I set ageing time to 0 from linux bridge, it works as dummy hub(send all packets to all ports), so ARP response is reached to VM on internal network.

In my case, my bridge has only two ports for internal and external connection, so it is not a big problem IMHO. but there is something not clear.

If there is a way to block loop-backed/duplicated request from standby NIC, or ignore duplicate request or other clever way to handle MAC table of bridge, let me now.

Thanks to read and hope to help your same problem!

Related Topic