Ubuntu – Setting up tmpfs `/run/lock` for hundreds of thousands of 0 byte lock files, and dealing with the inode limit

filesystemsinodemounttmpfsUbuntu

I have a situation where I need to create 100s of thousands of 0 byte lock files for concurrency control.

I've tested creating them by using:

for i in `seq 1 50000`; do touch "/run/lock/${i}.lock"; done

Since the files are 0 bytes, they don't up any space in the partition. Looking at df -h:

Filesystem      Size  Used Avail Use% Mounted on
tmpfs            50M  344K   49M   1% /run
none            5.0M     0  5.0M   0% /run/lock
none            246M     0  246M   0% /run/shm
none            100M     0  100M   0% /run/user

The 0% figure doesn't change at all in the /run/lock row.

However the memory size does increase at an average of approximately 1KB per lock file. I discovered this by comparing free -h before and after creating 70,000 lockfiles inside /run/lock. This memory increase was reflected in real memory usage (virtual memory minus the buffers/cache).

Later I discovered that this 1KB increase is most likely due to the inodes. So I checked inode usage using df -i:

Filesystem      Inodes  IUsed   IFree IUse% Mounted on
tmpfs            62729    322   62407    1% /run
none             62729  50001   12728   80% /run/lock
none             62729      1   62728    1% /run/shm
none             62729      2   62727    1% /run/user

As you can see, the lockfiles increase inodes inside the /run/lock partition.

I'm currently on Ubuntu and the /run mounts are not reflected inside /etc/fstab. Running mount gives me:

tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
none on /run/shm type tmpfs (rw,nosuid,nodev)
none on /run/user type tmpfs (rw,noexec,nosuid,nodev,size=104857600,mode=0755)

I have a couple questions regarding this (but the first one is the most important):

  1. How do I increase the inode limit permanently for /run/lock? So that this limit survives restarts?
  2. Would it be better off for me to create my own directory and mount tmpfs on it to use for this instead of using /run/lock?
  3. Is each partition's size limit completely independent from each other? That is storing files in /run doesn't seem to affect /run/lock and vice versa.
  4. Is the 1KB derived from the inode? I noticed that when creating non-empty files, the basic block is 4KB for each file.
  5. Why is /run given the filesystem type of tmpfs but /run/lock, /run/shm, /run/user give filesystem type of "none", especially since all of them are backed by TMPFS? Why aren't they all read as tmpfs in the Filesystem column?
  6. If all of the directories are independently constrained, how does the OOM killer handle in a situation where there are multiple full TMPFS partitions, each of them sized to 50% of the RAM, and where there are also processes contending for RAM as well. Obviously one cannot use over 100% of RAM. According to the https://www.kernel.org/doc/Documentation/filesystems/tmpfs.txt it mentions the system will deadlock. How does that work?

Best Answer

Responding to some of your question, in order:

  1. You can use mount -o remount,nr_inodes=NUM /run/lock in your application startup script (in case it's run with uid=0). It should also be safe to add relevant line to /etc/fstab, but haven't tested.
  2. Separation makes some sense here, as in case of filling up all inodes will not interfere with the rest of the system.
  3. Yes, completely independent.
  4. [...]
  5. With virtual (non-block device based) filesystems, you can put whatever as device in mount command, it's only the type that matters.
  6. [...]

Not sure if your application create empty files by opening it (and for how long), but you may also consider increasing open files limit (check ulimit), to avoid depletion.