I'm trying to debug an issue involving a KVM virtual machine instance where the instance doesn't respond to network requests when the source is on another physical machine, which is on the same subnet as the virtual machine instance connected via Linux bridging.
(This is happening in the context of an OpenStack deployment, Folsom edition, on Ubuntu 12.04, configured using nova-network for FlatDHCP mode, not multi-host. This problem only occurs with CentOS guests, not Ubuntu guests).
When I did a tcpdump inside of the CentOS guest, I discovered that the inbound packets are being tagged with "vlan 0". For example, if I manually configure an IP address of 10.40.0.5/16 inside of the guest and then do an "arping -i eth1 10.40.0.5" from another machine, with tcpdump I see "vlan 0"
# tcpdump -i eth0 -XX -vv -e
14:29:29.907212 54:78:1a:86:50:c9 (oui Unknown) > Broadcast, ethertype 802.1Q (0x8100), length 64: vlan 0, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.40.0.5 (Broadcast) tell 10.40.0.1, length 46
0x0000: ffff ffff ffff 5478 1a86 50c9 8100 0000 ......Tx..P.....
0x0010: 0806 0001 0800 0604 0001 5478 1a86 50c9 ..........Tx..P.
0x0020: 0a28 0001 ffff ffff ffff 0a28 0005 0000 .(.........(....
0x0030: 0000 0000 0000 0000 0000 0000 dac7 07ed ................
If I load the 8021q module, the guest will respond to the ARP request properly, although it won't respond properly to DHCP, and the resulting UDP packets are tagged vlan 0.
If I do a similar tcpdump on the Ubuntu 12.04 compute host on the vnet1 interface that corresponds to the virtual machine, I don't see the vlan 0 tags:
# tcpdump -i vnet1 -XX -vv -e
tcpdump: WARNING: vnet1: no IPv4 address assigned
tcpdump: listening on vnet1, link-type EN10MB (Ethernet), capture size 65535 bytes
15:59:34.023145 54:78:1a:86:50:c9 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.40.0.5 (Broadcast) tell 10.40.0.1, length 46
0x0000: ffff ffff ffff 5478 1a86 50c9 0806 0001 ......Tx..P.....
0x0010: 0800 0604 0001 5478 1a86 50c9 0a28 0001 ......Tx..P..(..
0x0020: ffff ffff ffff 0a28 0005 0000 0000 0000 .......(........
0x0030: 0000 0000 0000 0000 dac7 07ed ............
In between the two physical machines is a Cisco Nexus 3000 switch.
Edit: The switch is configured with only one vlan (vlan 1), which is the native VLAN. All ports on the switch are in access mode. Here's what a typical port looks like:
# show interface switchport
Name: Ethernet1/1
Switchport: Enabled
Switchport Monitor: Not enabled
Operational Mode: access
Access Mode VLAN: 1 (default)
Trunking Native Mode VLAN: 1 (default)
Trunking VLANs Enabled: 1
Administrative private-vlan primary host-association: none
Administrative private-vlan secondary host-association: none
Administrative private-vlan primary mapping: none
Administrative private-vlan secondary mapping: none
Administrative private-vlan trunk native VLAN: none
Administrative private-vlan trunk encapsulation: dot1q
Administrative private-vlan trunk normal VLANs: none
Administrative private-vlan trunk private VLANs: none
Operational private-vlan: none
Unknown unicast blocked: disabled
Unknown multicast blocked: disabled
Why would these vlan 0 tags get added to the frames like this? Could it be that the switch is adding these tags, but Ubuntu somehow doesn't see them when it passes the frames on to the CentOS guest? Or could it be the CentOS kernel adding the tags to incoming frames? If so, why would that happen?
Best Answer
Faced similar situation today on centos 6 ,an upgrade to latest kernel fixes this.