Ghost “vlan 0” tags added to ethernet frames in CentOS guest under OpenStack

kvm-virtualizationnetworkingopenstackvirtualizationvlan

I'm trying to debug an issue involving a KVM virtual machine instance where the instance doesn't respond to network requests when the source is on another physical machine, which is on the same subnet as the virtual machine instance connected via Linux bridging.

(This is happening in the context of an OpenStack deployment, Folsom edition, on Ubuntu 12.04, configured using nova-network for FlatDHCP mode, not multi-host. This problem only occurs with CentOS guests, not Ubuntu guests).

When I did a tcpdump inside of the CentOS guest, I discovered that the inbound packets are being tagged with "vlan 0". For example, if I manually configure an IP address of 10.40.0.5/16 inside of the guest and then do an "arping -i eth1 10.40.0.5" from another machine, with tcpdump I see "vlan 0"

# tcpdump -i eth0 -XX -vv -e
14:29:29.907212 54:78:1a:86:50:c9 (oui Unknown) > Broadcast, ethertype 802.1Q (0x8100), length 64: vlan 0, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.40.0.5 (Broadcast) tell 10.40.0.1, length 46
    0x0000:  ffff ffff ffff 5478 1a86 50c9 8100 0000  ......Tx..P.....
    0x0010:  0806 0001 0800 0604 0001 5478 1a86 50c9  ..........Tx..P.
    0x0020:  0a28 0001 ffff ffff ffff 0a28 0005 0000  .(.........(....
    0x0030:  0000 0000 0000 0000 0000 0000 dac7 07ed  ................

If I load the 8021q module, the guest will respond to the ARP request properly, although it won't respond properly to DHCP, and the resulting UDP packets are tagged vlan 0.

If I do a similar tcpdump on the Ubuntu 12.04 compute host on the vnet1 interface that corresponds to the virtual machine, I don't see the vlan 0 tags:

# tcpdump -i vnet1 -XX -vv -e
tcpdump: WARNING: vnet1: no IPv4 address assigned
tcpdump: listening on vnet1, link-type EN10MB (Ethernet), capture size 65535 bytes
15:59:34.023145 54:78:1a:86:50:c9 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.40.0.5 (Broadcast) tell 10.40.0.1, length 46
    0x0000:  ffff ffff ffff 5478 1a86 50c9 0806 0001  ......Tx..P.....
    0x0010:  0800 0604 0001 5478 1a86 50c9 0a28 0001  ......Tx..P..(..
    0x0020:  ffff ffff ffff 0a28 0005 0000 0000 0000  .......(........
    0x0030:  0000 0000 0000 0000 dac7 07ed            ............

In between the two physical machines is a Cisco Nexus 3000 switch.

Edit: The switch is configured with only one vlan (vlan 1), which is the native VLAN. All ports on the switch are in access mode. Here's what a typical port looks like:

# show interface switchport
Name: Ethernet1/1
  Switchport: Enabled
  Switchport Monitor: Not enabled
  Operational Mode: access
  Access Mode VLAN: 1 (default)
  Trunking Native Mode VLAN: 1 (default)
  Trunking VLANs Enabled: 1
  Administrative private-vlan primary host-association: none
  Administrative private-vlan secondary host-association: none
  Administrative private-vlan primary mapping: none
  Administrative private-vlan secondary mapping: none
  Administrative private-vlan trunk native VLAN: none
  Administrative private-vlan trunk encapsulation: dot1q
  Administrative private-vlan trunk normal VLANs: none
  Administrative private-vlan trunk private VLANs: none
  Operational private-vlan: none
  Unknown unicast blocked: disabled
  Unknown multicast blocked: disabled

Why would these vlan 0 tags get added to the frames like this? Could it be that the switch is adding these tags, but Ubuntu somehow doesn't see them when it passes the frames on to the CentOS guest? Or could it be the CentOS kernel adding the tags to incoming frames? If so, why would that happen?

Best Answer

Faced similar situation today on centos 6 ,an upgrade to latest kernel fixes this.

Related Topic