Intermitent connection issues inside LXC

bridgelxcnetworking

I am experiencing connection issues inside a LXC that are driving me mad. They are intermitent. They appear during some time, and they suddenly disapear.

Scenario

A lxc inside a host. Both are running Debian GNU/Linux 8.3
In the lxc there is an installation of Piwik (open source PHP software for stats, with apache, mysql) and an ssh server. The lxc apache is reachable through an nginx proxy in the host

The lxc config:

lxc.tty = 6
lxc.pts = 1024
lxc.rootfs = /var/lib/lxc/hammond/rootfs
lxc.cgroup.devices.deny = a
# /dev/null and zero
lxc.cgroup.devices.allow = c 1:3 rwm
lxc.cgroup.devices.allow = c 1:5 rwm
# consoles
lxc.cgroup.devices.allow = c 5:1 rwm
lxc.cgroup.devices.allow = c 5:0 rwm
lxc.cgroup.devices.allow = c 4:0 rwm
lxc.cgroup.devices.allow = c 4:1 rwm
# /dev/{,u}random
lxc.cgroup.devices.allow = c 1:9 rwm
lxc.cgroup.devices.allow = c 1:8 rwm
lxc.cgroup.devices.allow = c 136:* rwm
lxc.cgroup.devices.allow = c 5:2 rwm
# rtc
lxc.cgroup.devices.allow = c 254:0 rwm

# mounts point
lxc.mount.entry=proc /var/lib/lxc/hammond/rootfs/proc proc nodev,noexec,nosuid 0 0
lxc.mount.entry=devpts /var/lib/lxc/hammond/rootfs/dev/pts devpts defaults 0 0
lxc.mount.entry=sysfs /var/lib/lxc/hammond/rootfs/sys sysfs defaults  0 0

# networking
lxc.utsname = hammond
lxc.network.type = veth
#lxc.network.macvlan.mode = private
lxc.network.flags = up
lxc.network.link = br-hammond
lxc.network.ipv4 = 192.168.100.2/24
lxc.network.ipv4.gateway = 192.168.100.1
lxc.network.hwaddr = 00:1E:10:C1:6B:C9

lxc.start.auto = 1

# http://serverfault.com/questions/658052/systemd-journal-in-debian-jessie-lxc-container-eats-100-cpu
lxc.autodev = 1
lxc.kmsg = 0

Issues:

1. Cannot connect to local database

Suddenly, Piwik reports:

SQLSTATE[HY000] [2003] Can't connect to MySQL server on '127.0.0.1' (111)

The database is running, of course.

If I telnet from inside the lxc (127.0.0.1:3306), I can connect to the database
If I telnet the apache from inside the lxc (127.0.0.1:80), Piwik works fine. It connects to the database, renders the page as usual and doesn't report any error.
If I telnet the apache from the host (192.168.100.2:80), Piwik reports the database error.

2. SSH freezes

I am tunneling the ssh connection to the lxc using ProxyCommand

ProxyCommand ssh -q host nc -q0 192.168.100.2 22

After the ssh negotiation phase, the connection freezes. If I type keys, they don't show up in the console. Finally, the connection timeouts with

packet_write_wait: Connection to UNKNOWN: Broken pipe

I have sniffed the packets with tcpdump and ssh key exchanges goes fine. Then, the traffic stops after 0.5 seconds

I think this is a bug in last Debian kernel updates. It used to work fine, but I am experiencing these problems since a few weeks ago. As I mention, they are intermitent. Suddenly, everything goes fine.

Suggestions on how to investigate further are welcomed

Best Answer

I've had a problem with the same symptoms. In my case, there was another host with the same IP on the vlan I used in the bridge. Sometimes the other host would be faster to answer to the ARP request (despite being another physical machine), at which point the lxc guest would save the wrong MAC address in its ARP table and continue sending ethernet frames to the wrong address until another ARP request "resolved" the problem.

I diagnosed this with a timestamped ping from host to guest:

# ping -n 10.70.70.10 | perl -nle 'BEGIN {$|++} print scalar(localtime), " ", $_' |tee -a ping10707010.log
[...]
Sun Jul 31 09:18:53 2016 64 bytes from 10.70.70.10: icmp_seq=3389 ttl=64 time=0.035 ms
Sun Jul 31 09:18:54 2016 64 bytes from 10.70.70.10: icmp_seq=3390 ttl=64 time=0.035 ms
Sun Jul 31 09:18:55 2016 64 bytes from 10.70.70.10: icmp_seq=3391 ttl=64 time=0.027 ms
Sun Jul 31 09:19:45 2016 64 bytes from 10.70.70.10: icmp_seq=3441 ttl=64 time=0.064 ms
Sun Jul 31 09:19:46 2016 64 bytes from 10.70.70.10: icmp_seq=3442 ttl=64 time=0.038 ms
Sun Jul 31 09:19:47 2016 64 bytes from 10.70.70.10: icmp_seq=3443 ttl=64 time=0.036 ms

as well as a tcpdump on both host and guest:

# tcpdump -n -i brv3001 # on the host
[...]
09:18:55.724751 IP 10.70.0.1 > 10.70.70.10: ICMP echo request, id 26519, seq 3391, length 64
09:18:55.724768 IP 10.70.70.10 > 10.70.0.1: ICMP echo reply, id 26519, seq 3391, length 64
09:18:56.336109 ARP, Request who-has 10.70.70.10 tell 10.70.0.1, length 42
09:18:56.336147 ARP, Reply 10.70.70.10 is-at 00:16:3e:46:46:0a, length 28
[...]
09:19:44.728738 ARP, Request who-has 10.70.70.10 tell 10.70.0.1, length 28
09:19:44.728769 ARP, Reply 10.70.70.10 is-at 00:16:3e:46:46:0a, length 28
# tcpdump -n -i infra0 # on the guest
[...]
09:18:55.724757 IP 10.70.0.1 > 10.70.70.10: ICMP echo request, id 26519, seq 3391, length 64
09:18:55.724767 IP 10.70.70.10 > 10.70.0.1: ICMP echo reply, id 26519, seq 3391, length 64
09:18:56.336123 ARP, Request who-has 10.70.70.10 tell 10.70.0.1, length 42
09:18:56.336144 ARP, Reply 10.70.70.10 is-at 00:16:3e:46:46:0a, length 28
[...]
09:19:44.728745 ARP, Request who-has 10.70.70.10 tell 10.70.0.1, length 28
09:19:44.728766 ARP, Reply 10.70.70.10 is-at 00:16:3e:46:46:0a, length 28

which allowed me to see that around the point when the network would drop out and when it would reactivate, ARP requests were being issued and answered. The ARP requests seemed to be in order (using the correct MACs), but i decided to check the facts as seen by the OS anyway, so I logged ARP tables on host and guest with timestamps:

# while true; do date; arp -n; sleep 1; done > arp.log 2>&1 # on the host
[...]
Sun Jul 31 09:18:55 CEST 2016
Address                  HWtype  HWaddress           Flags Mask            Iface
10.70.70.10              ether   00:16:3e:46:46:0a   C                     brv3001
Sun Jul 31 09:18:56 CEST 2016
Address                  HWtype  HWaddress           Flags Mask            Iface
10.70.70.10              ether   00:16:3e:46:46:0a   C                     brv3001
# while true; do date; arp -n; sleep 1; done > arp.log 2>&1 # on the guest
Sun Jul 31 09:18:55 CEST 2016
Address                  HWtype  HWaddress           Flags Mask            Iface
10.70.0.1                ether   00:1e:68:4a:03:b0   C                     infra0
Sun Jul 31 09:18:56 CEST 2016
Address                  HWtype  HWaddress           Flags Mask            Iface
10.70.0.1                ether   c4:34:6b:22:b6:7c   C                     infra0

which allowed me to understand that the host did not have a faulty MAC of the guest, but the guest somehow arrived at a faulty MAC of the host. Irritatingly, that was not reflected in the tcpdump information. (NB: there may be a race condition somewhere in libpcap or the ip stack that would benefit from investigating)

After finding the erroneous MAC, I looked up which vendor the erroneous MAC address belonged to, and thus was able to find the offending machine. If that information had been more ambiguous, I'm sure my switch would've had functionality to help me find the right switch port.

I suppose that up/downgrading kernels and certain userland tools would change and maybe even remove all or some of the symptoms through changed timings, slightly different behavior, other network services being active etc. For example, a ping from guest to host would reliably "fix" the problem in my case.

Also, do not forget that the IP addresses you can see with ifconfig are not all of the IP addresses used by the system. ip addr ls will be more comprehensive on linux and maybe even some more advanced iptables configurations could play a role too. If you are in bad luck, the host responding to the arps may even have a broken IP stack. You may even get ARP replies from other customers of your ISP if your network isn't properly isolated.

I realize that this might not be the exact solution to your problem, but I thought I'd leave some pointers for debugging for the next person to look for and find this issue on serverfault.

Related Solutions

Bridged virtual interface is not available or visible to ifconfig

D'oh! Right after I post this, I find the answer. Apparently vmnet-netifup wasn't running for vmnet0. Once I ran:

`/usr/bin/vmnet-netifup -d /var/run/vmnet-netifup-vmnet0.pid /dev/vmnet0 vmnet0`

it worked fine. Now why didn't it automatically start when the other two did? That's an open question still.

Is it possible to start LXC container inside LXC container

I'm going to dispel a few myths here.

This is just a bad idea. I'm sorry. – Jacob Mar 5 at 20:30

I don't see how this is a bad idea. It's really just a chroot inside a chroot. On one hand, it could possibly decrease performance in some negligible manner (nothing compared to running a VM inside a VM). On the other hand, it's likely to be more secure (e.g. more isolated from the root host system and it's constituents).

Do you actually have a real reason to do this? Please remember that questions here should be about actual problems that you face. – Zoredache Mar 5 at 21:52

I agree 100% with the poster's following comment. Furthermore, I think it's safe to assume that everybody who posts a question on here likely thinks that they have a real reason to do [ it ]..

I think, that lxc should be able to simplify VM migration(and backup+recovery too). But I'm not sure about cases, when there is no access to host OS(cheap vps for example). – Mikhail Mar 6 at 11:17

I actually came across this question back in June when I was first diving into LXC for PaaS/IaaS projects, and I was particularly interested in the ability to allow users to emulate cloud environments for development purposes.

LXCeption. We're too deep. – Tom O'Connor Mar 6 at 22:46

I laughed a little bit when I read this one, but that's not, at all, the case :)

Anyway, I eventually set up a VirtualBox environment with a stock install of Ubuntu 12.04 LTS Server Edition after reading all this, thinking that this was 100% possible. After installing LXC, I created a new container, and installed LXC inside the container with apt-get. Most of the installation progressed well, but resulted in error eventually due to a problem with the cgroup-lite package, whose upstart job failed to start after the package had been installed.

After a bit of searching, I came across this fine article at stgraber.org (the goodies are hiding under the "Container Nesting" section):

sudo apt-get install lxc
sudo lxc-create -t ubuntu -n my-host-container -t ubuntu
sudo wget https://www.stgraber.org/download/lxc-with-nesting -O /etc/apparmor.d/lxc/lxc-with-nesting
sudo /etc/init.d/apparmor reload
sudo sed -i "s/#lxc.aa_profile = unconfined/lxc.aa_profile = lxc-container-with-nesting/" /var/lib/lxc/my-host-container/config
sudo lxc-start -n my-host-container
(in my-host-container) sudo apt-get install lxc
(in my-host-container) sudo stop lxc
(in my-host-container) sudo sed -i "s/10.0.3/10.0.4/g" /etc/default/lxc
(in my-host-container) sudo start lxc
(in my-host-container) sudo lxc-create -n my-sub-container -t ubuntu
(in my-host-container) sudo lxc-start -n my-sub-container

Installing that AppArmor policy and restarting the daemon did the trick (don't forget to change the network ranges, though!). In fact, I thought that particular snippet was so important that I mirrored it @ http://pastebin.com/JDFp6cTB just in case the article ever goes offline.

After that, sudo /etc/init.d/cgroup-lite start succeeded and it was smooth sailing.

So, yes, it is possible to start an LXC container inside of another LXC container :)