libvirt_lxc populates the guest's /dev
tree on startup according to the guest's configuration. The documentation says you have to put the configuration in the guest's XML configuration file. Use a hostdev
with the "misc" type
and with its source
pointing to a char
device at /dev/net/tun
.
The snippet should look like this:
...
<devices>
...
<hostdev mode='capabilities' type='misc'>
<source>
<char>/dev/net/tun</char>
</source>
</hostdev>
</devices>
...
To edit the guest's XML file use virsh
. For a local instance use this command:
virsh -c lxc:/// edit GUESTNAME
I can confirm this working with libvirt-1.2.1.
Edit: I'll keep my original answer below, but I'll try to explain what's happening here and provide a general solution for you.
Edit 2: Provided another option.
The problem that you're hitting here has to do with how the kernel manages I/O. When you make a write to your filesystem, that write isn't immediately committed to disk; that would be incredibly inefficient. Instead, writes are cached in an area of memory referred to as the page cache, and periodically written in chunks out to disk. The "dirty" section of your log describes the size of this page cache that hasn't been written out to disk yet:
dirty:123816kB
So what empties this dirty cache? Why isn't it doing it's job?
'Flush' on Linux is responsible for writing dirty pages out to disk. It's a daemon that wakes up periodically to determine if writes to disk are required, and, if so, performs them. If you are a C type of guy, start here. Flush is incredibly efficient; it does a great job of flushing stuff to disk when needed. And it's working exactly how it is supposed to.
Flush runs outside of your LXC container, since your LXC container doesn't have its own kernel. LXC containers exist as a construct around cgroups, which is a feature of the Linux kernel that allows better limitations and isolation of process groups, but not its own kernel or flush daemon.
Since your LXC has a memory limit lower than the memory the kernel has available, weird things happen. Flush assumes it has the full memory of the host to cache writes in. A program in your LXC starts to write a big file, it buffers...buffers...and eventually hits it's hard limit, and starts calling the OOM manager. This isn't a failure of any particular component; it's expected behavior. Kind of. This sort of thing should be handled by cgroups, but it doesn't seem like it is.
This completely explains the behavior you see between instance sizes. You'll start flushing to disk much sooner on the micro instance (with 512MB RAM) vs on a large instance
Ok, that makes sense. But it's useless. I still need to write me a big-ass file.
Well, flush isn't aware of your LXC limit. So instead of patching the kernel, there are a few options here for things you can try to tweak:
/proc/sys/vm/dirty_expire_centiseconds
This controls how long a page can be held in the dirty cache and written to disk. By default it's 30 seconds; try setting it lower to start pushing it out faster.
/proc/sys/vm/dirty_background_ratio
This controls what percentage of active memory flush is allowed to fill up before it starts forcing writes. There is a bit of fiddling that goes into sorting out the exact total here, but the easiest explanation is to just look at your total memory. By default it's 10% (on some distros it's 5%). Set this lower; it'll force writes out to disk sooner and may keep your LXC from running out of it's limits.
Can't I just screw with the filesystem a bit?
Well, yeah. But make sure you test this out.. you could affect performance. On your mounts in /etc/fstab where you'll be writing this to, add the 'sync' mount option.
Original answer:
Try reducing the blocksize used by DD:
dd if=/dev/zero of=test2 bs=512 count=1024000
You can only write one sector at a time (512 bytes on older HDDs, 4096
on newer). If DD is pushing writes to disk faster than the disk can
accept them, it will start caching the writes in memory. That's why
your file cache is growing.
Best Answer
There is no Linux running inside the container, as Linux is the kernel and you share it with the host. Your apps (like free and top) read system info from /proc and thus get details about the host as lxc does not fake the limited resources by default (in contrast to eg OpenVZ). This is ok unless your app behaves differently depending on the ram/swap/cpu numbers. If your app tries to allocate more memory than available to the container it will be OOM-killed as any other app in a non-container environment.
If you want to see the limited resources inside the container, do the following on the host:
Afterwards restart the container. LXCFS will emulate a few files in /proc of the container and apps will see the limited resources (cpu, ram, swap) properly.
More info on the lxcfs homepage: https://linuxcontainers.org/lxcfs/