Linux bonding (balance-tlb), KVM guests and L2 switches = unicast flooding

bondingfloodingkvm-virtualizationlinuxswitch

I have a unicast flooding problem on my network, that started when I moved some software to virtualized guests.
It seems very similar to what reported here: Switch flooding when bonding interfaces in Linux . That question dates back to 2012… so maybe now there's a better solution, maybe on Linux/KVM side.

In the following I'll try to explain the architecture and the troubleshooting steps I carried out.
I hope somebody could give me some hints and maybe a solution!
Thanks in advance!

ARCHITECTURE

Server

Linux host with PROXMOX 4.1 and several Windows virtual machines.

The host has 4 Gbit ethernet interfaces (with MAC addresses A, B, C and D), bonded with the balance-tlb method.

The bond is then bridged to the virtual machines. Therefore each VM has its own MAC address (with MAC addresses X, Y, Z,…).

The software hosted on the virtual machines interacts with many devices in the field.

Network

The server is connected to a Juniper switch, which then connects to a wide Cisco network. Everything is level 2.

PROBLEM

On the Cisco network I see, from time to time, unicast storms. It seems they start each 5 minutes or multiples of it. I analyzed the traffic and I see that suddenly the traffic FROM some devices to a certain virtual machine (and not vice-versa) is replicated on all the physical ports of the switches (on the same VLAN). The problem solves alone after some seconds.

IDEA

Reading Cisco documentation (regarding unicast flooding and MAC "aging time") and also the aforementioned link, I found that the problem may due to the fact that the MAC address of the virtual machines does not appear so often on the network, so that after a certain "aging time" the switches start to forward such traffic to all ports until they discover where the host is.

TROUBLESHOOTING

I connected a laptop on the network and started to ping it from one virtual machine.
I sniffed the packets on the laptop.

From this I could see:

  • ARP request from the virtual machine, using as MAC source its own MAC address (let's say X)

  • ARP reply from the laptop, using as MAC source its own MAC address (L) and destination the VM MAC address (X)

  • ping requests from the virtual machine, using as MAC source one of the MAC addresses of the bonded physical ethernet ports (A, B, C, D, and switching from time to time between three of them) and as MAC destination L

  • ping replies from the laptop, using as MAC source L and as MAC destination the virtual machine MAC address (X)

Basically it seems that, except for the first ARP request, the virtual machine never appears to the laptop with its own MAC address (X) but always with A, B, C or D (varying in time). However, the laptop always responds to X.

SOLUTION?

I read that it's ok in balance-tlb mode that traffic goes out from different interfaces depending on load. However, I think that this behaviour combined with the fact that virtual machines appear on the net with the source MAC address of the physical interface in use may generate the problem I reported.

If this is correct, does anybody know whether there is a way to always force the use of the VM own MAC address for every communication? (e.g. as it already happens for ARP requests)
Or maybe the solution is somewhere else?

I thought that I could set up Windows VMs for resetting the ARP table every 3 minutes… but this seems a bit too much brute force to me… 🙂

Thanks again for any help!

EDIT: I confirm that if during a flooding event I quickly log into the corresponding VM and issue an ARP table reset, I see new ARP requests from the VM (telling its own MAC address to the net) and the storm stops immediately.

Best Answer

Balance-tlb (mode 5) and balance-alb (mode 6) do not work with virtual bridges. They can cause broadcast loops, they rewrite source MAC in packets under some conditions, and mode 6 intercepts ARP by design.

You need to use active-backup (mode 1) with no switch config, or balance-xor (mode 2) or 802.3ad (mode 4) with switch config.

You could also use round-robin (mode 0) or broadcast (mode 3) with switch config, but these are not good for TCP stream performance.

Related Topic