Linux – How to determine POSIX advisory file locks are working in simfs in the VM I’m using

concurrencyfilesystemslinux

I'm looking for a command-line utility or some other way to test effectiveness of file locks, specifically POSIX advisory locks (which aren't only for POSIX, btw) in a Linux filesystem.

Specifically, I want to ensure POSIX advisory locking (file locking) is working correctly in simfs in a Linux/Ubuntu VM used for continuous integration testing. We've had file corruption that only occurs to a SQLite DB file when there are concurrent writes by 30 processes. This is only being used in testing by one project, but we'd like to help track down the problem so others won't run into it.

According to the SQLite team and documentation, concurrent writes are only supported when POSIX advisory locks are working in the filesystem/OS. The test I have that uses SQLite works in v3.7.7 of SQLite in OS X, but the same test corrupts the DB file in v3.7.9 of SQLite in the Ubuntu VM provided by TravisCI (and hosted by Blue Box). The SQLite team did not indicate that there were any concurrency issues fixed between those two versions, since concurrency is dependent on the OS/filesystem's POSIX advisory locks working.

Additional information about the environment that I'm trying to investigate:

$ sqlite3 -version
3.7.9 2011-11-01 00:52:41 c7c6050ef060877ebe77b41d959e9df13f8c9b5e

$ uname -r
2.6.32-042stab061.2

$ cat /proc/version
Linux version 2.6.32-042stab061.2 (root@rh6-build-x64) (gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) ) #1 SMP Fri Aug 24 09:07:21 MSK 2012

$ lsb_release -a
Distributor ID: Ubuntu
Description: Ubuntu 12.04.2 LTS
Release: 12.04
Codename: precise

(home dir it that is exhibiting the problem is within the / mount.)

$ cat /proc/mounts
/dev/simfs / simfs rw,relatime 0 0
...

$ mount
/vz/private/6062841 on / type simfs (rw)
...

I have a ticket in with those that provide the VM here where they stated that they are not using network filesystems, which commonly are associated with POSIX lock-related issues because of the complexity involved with implementing POSIX locks in such environments. In addition to the info above, though this press release would seem to indicate OpenStack is being used, the path above shows 'vz' in the mount, making it seem OpenVZ is being used.

As for tools to help diagnose POSIX lock failures, the only one that I've heard about is a ping-pong test that is part of called smbtorture which tests POSIX locking with Samba, but I'm not using Samba in this case, so I'm not sure that would help.

If there is no command-line test available, how would I go about testing that it is working if I all I have available to me is limited access to the VM (as sudo doesn't require password as my user, but the commands that should output something using sudo don't work, so I think it is overriden)? Are there commands that I could have the VM administrator run to gather more info to help resolve this problem?

Best Answer

First off: file locks and pthread mutexes are entirely different beasts. File locks are used to advice the current or other processes that a file is currently not to be used. Pthread mutexes are used to coordinate critical sections between threads in the current process only.

File locking is done flock(2) and friends, and conveniently, there's a shell script wrapper for it. To test whether file locks works, you open two terminals and run this:

In terminal one:

flock /path/to/lockfile sleep 120

And in the other terminal while the first one is holding the lock:

if ! flock -n /tmp/foo.lock true ; then echo "flock works"; else echo "flock fails"; fi

That should tell you whether file locks work.

And if you have to run it in one script, try this:

flock /path/to/lockfile sleep 120 &
if ! flock -n /tmp/foo.lock true ; then echo "flock works"; else echo "flock fails"; fi
kill $!

Another way of locking files is the fcntl system call. It's rather annoying to test with ruby, but this python code should do the trick:

import fcntl, os, time

fd = open('/tmp/test.lock', 'w')
if os.fork():
    fcntl.lockf(fd, fcntl.LOCK_EX)
    os.wait()
else:
    time.sleep(0.1)
    fcntl.lockf(fd, fcntl.LOCK_EX|fcntl.LOCK_NB)

It tries to lock the same file in 2 different processes. The second lock is non-blocking, so should immediately raise an error. The expected output, if fcntl locks are properly working, is:

Traceback (most recent call last):
  File "test.py", line 12, in <module>
    fcntl.lockf(fd, fcntl.LOCK_EX|fcntl.LOCK_NB)
IOError: [Errno 11] Resource temporarily unavailable