KVM QEMU VMs – Fixing Static IP Network Connectivity Loss

kvm-virtualizationlinux-networkingnetworkingubuntu-18.04

I have a KVM/QEMU setup with both host & guest (VM) running Ubuntu 18.04 LTS with bridged networking. VMs are configured with static IP loose network connectivity randomly (there is no pattern). VMs which are configured with DHCP works fine.

Here is my host network config,

network:
    version: 2
    ethernets:
        eno1:
            dhcp4: no
            dhcp6: no
        eno4:
            dhcp4: true
        eno5np0:
            dhcp4: true
        eno6np1:
            dhcp4: true
        ens2f0np0:
            dhcp4: true
        ens2f1np1:
            dhcp4: true
    bridges:
        br0:
            interfaces: [eno1]
            dhcp4: no
            addresses:
            - 10.2.0.92/24
            gateway4: 10.2.1.252
            nameservers:
                addresses:
                - 8.8.8.8

Here is my vm (guest) network config with static IP,

network:
    version: 2
    ethernets:
            ens3:
                    dhcp4: no
                    addresses:
                    - 10.2.0.210/23
                    gateway4: 10.2.1.252
                    nameservers:
                        addresses:
                        - 8.8.8.8

Here is my vm (guest) network config with DHCP,

network:
    version: 2
    ethernets:
            ens3:
                    dhcp4: true

VMs with static IP goes into kind of idle state. So when ever trying to SSH or access the services in that, it takes time then it connects,

$ nc -z -v -w5 10.2.0.210 22
nc: connect to 10.2.0.210 port 22 (tcp) timed out: Operation now in progress

Try again, it will work, because the VM moved from idle to working state because of the first try,

$nc -z -v -w5 10.2.0.210 22
Connection to 10.2.0.210 22 port [tcp/ssh] succeeded!

There is no issue with VMs which has DHCP. It connects just fine any time,

$ nc -z -v -w5 10.2.0.184 22
Connection to 10.2.0.184 22 port [tcp/ssh] succeeded!

I have checked the following links,

but it didn't help.

Any issue in the KVM configuration? Not only SSH, but any services exposed in the VMs are also not accessible. I have verified that VMs are in running state when I query virsh.

Best Answer

One rather basic issue I see is that your gateway on br0 is not within the address scope. You define a /24 instead of a /23.

  br0:
            interfaces: [eno1]
            dhcp4: no
            addresses:
            - 10.2.0.92/**24**
            gateway4: 10.2.1.252
            nameservers:
                addresses:
                - 8.8.8.8

You just need to change /24 into /23

10.2.0.92/23 would encompass 10.2.0.0 to 10.2.1.255, just like the hosts in your VM's. The issue isn't overlapping spaces, but that with the /24 the hosts gateway is not reachable by the host.

If you were to ping from guest -> host... The packet would leave the guest, and get broadcast because the host is within the NETMASK for the current network of the guest. The host would receive the packet. The host would send a reply. Because the destination of x.x.1.x is not on the current network of x.x.0.x the packet would normally be routed to the gateway. Oh wait, the gateway isn't on x.x.0.x either. The packet would go nowhere.

Remember, packets are not smart. They only go exactly where you tell them.

A part of this that I didn't cover above is ARP. @Gerrit's comment above addresses that as well. When packets are sent on within the collision domain they travel by MAC address not IP. When 10.2.0.210/23 sends a packet to 10.2.0.92, outbound packets are sent directly to 10.2.0.92 after 10.2.0.210/23 sends a broadcast packet asking who is 10.2.0.92. I'm not sure if the guest will get the reply or not. It may since the ARP reply the requester MAC in it. The Guest will add that information to its own ARP table.

The Host though on the reply wont have the MAC address of the Guest, because for the host the guest lies outside its collission domain of a /24. It would normally go to the gateway to get routed, but it cant do that either because the HOST gateway is also not in the local network. Gateways which are Routers can't route packets back down the wire they come in on. The packet would need to traverse the device. It would probably get dropped if it could have made it to the gateway.

What I find more interesting is that it works sometimes. Netmasks, Broadcast, and Collision Domains all only affect the the sender of packets, not the receivers of them. Possibly because the Guest is a virtual thing on the host that some packets are going through the Virtual switch and are being seen.