Ubuntu – Will Linux/Ubuntu running in an LXC container understand cgroup memory limits

cgrouplxcmemory usageUbuntu

I'm planning to rent a physical server and run MySQL inside an LXC container on that server. I'd like to control maximum memory usage for the MySQL instance using cgroup limits :

lxc.cgroup.memory.limit_in_bytes = 8192M

This will effectively control how much memory the container may use, but top or free inside the container will still report overall memory for the LXC host (physical server). I'm not an expert in how Linux manages memory on general, but I assume that – on a physical machine – Linux will begin to swap if it sees that it is about to hit a physical memory limit. Does Linux handle the "effective" memory limit (be it cgroup limit or physical limit) the same way regardless of wether it runs inside a container or on a physical host?

Best Answer

There is no Linux running inside the container, as Linux is the kernel and you share it with the host. Your apps (like free and top) read system info from /proc and thus get details about the host as lxc does not fake the limited resources by default (in contrast to eg OpenVZ). This is ok unless your app behaves differently depending on the ram/swap/cpu numbers. If your app tries to allocate more memory than available to the container it will be OOM-killed as any other app in a non-container environment.

If you want to see the limited resources inside the container, do the following on the host:

apt-get install lxcfs

Afterwards restart the container. LXCFS will emulate a few files in /proc of the container and apps will see the limited resources (cpu, ram, swap) properly.

More info on the lxcfs homepage: https://linuxcontainers.org/lxcfs/

Related Solutions

Allowed cgroup devices for libvirt/lxc container

libvirt_lxc populates the guest's /dev tree on startup according to the guest's configuration. The documentation says you have to put the configuration in the guest's XML configuration file. Use a hostdev with the "misc" type and with its source pointing to a char device at /dev/net/tun.

The snippet should look like this:

...
<devices>
    ...
    <hostdev mode='capabilities' type='misc'>
        <source>
            <char>/dev/net/tun</char>
        </source>
    </hostdev>
</devices>
...

To edit the guest's XML file use virsh. For a local instance use this command:

virsh -c lxc:/// edit GUESTNAME

I can confirm this working with libvirt-1.2.1.

Linux – Why are applications in a memory-limited LXC container writing large files to disk being killed by the OOM

Edit: I'll keep my original answer below, but I'll try to explain what's happening here and provide a general solution for you.

Edit 2: Provided another option.

The problem that you're hitting here has to do with how the kernel manages I/O. When you make a write to your filesystem, that write isn't immediately committed to disk; that would be incredibly inefficient. Instead, writes are cached in an area of memory referred to as the page cache, and periodically written in chunks out to disk. The "dirty" section of your log describes the size of this page cache that hasn't been written out to disk yet:

dirty:123816kB

So what empties this dirty cache? Why isn't it doing it's job?

'Flush' on Linux is responsible for writing dirty pages out to disk. It's a daemon that wakes up periodically to determine if writes to disk are required, and, if so, performs them. If you are a C type of guy, start here. Flush is incredibly efficient; it does a great job of flushing stuff to disk when needed. And it's working exactly how it is supposed to.

Flush runs outside of your LXC container, since your LXC container doesn't have its own kernel. LXC containers exist as a construct around cgroups, which is a feature of the Linux kernel that allows better limitations and isolation of process groups, but not its own kernel or flush daemon.

Since your LXC has a memory limit lower than the memory the kernel has available, weird things happen. Flush assumes it has the full memory of the host to cache writes in. A program in your LXC starts to write a big file, it buffers...buffers...and eventually hits it's hard limit, and starts calling the OOM manager. This isn't a failure of any particular component; it's expected behavior. Kind of. This sort of thing should be handled by cgroups, but it doesn't seem like it is.

This completely explains the behavior you see between instance sizes. You'll start flushing to disk much sooner on the micro instance (with 512MB RAM) vs on a large instance

Ok, that makes sense. But it's useless. I still need to write me a big-ass file.

Well, flush isn't aware of your LXC limit. So instead of patching the kernel, there are a few options here for things you can try to tweak:

/proc/sys/vm/dirty_expire_centiseconds

This controls how long a page can be held in the dirty cache and written to disk. By default it's 30 seconds; try setting it lower to start pushing it out faster.

/proc/sys/vm/dirty_background_ratio

This controls what percentage of active memory flush is allowed to fill up before it starts forcing writes. There is a bit of fiddling that goes into sorting out the exact total here, but the easiest explanation is to just look at your total memory. By default it's 10% (on some distros it's 5%). Set this lower; it'll force writes out to disk sooner and may keep your LXC from running out of it's limits.

Can't I just screw with the filesystem a bit?

Well, yeah. But make sure you test this out.. you could affect performance. On your mounts in /etc/fstab where you'll be writing this to, add the 'sync' mount option.

Original answer:

Try reducing the blocksize used by DD:
dd if=/dev/zero of=test2 bs=512 count=1024000
You can only write one sector at a time (512 bytes on older HDDs, 4096 on newer). If DD is pushing writes to disk faster than the disk can accept them, it will start caching the writes in memory. That's why your file cache is growing.

Best Answer

Related Solutions

Allowed cgroup devices for libvirt/lxc container

Linux – Why are applications in a memory-limited LXC container writing large files to disk being killed by the OOM

Related Topic