SQLite on Google Cloud Persistent Disk

google-compute-enginesqlite

Does Google's Persistent Disk have proper reader/writer locks for concurrent access (many virtual machines accessing data from a single Persistent Disk) which is required for SQLite?

According to the SQLite FAQ:

SQLite uses reader/writer locks to control access to the database. […] But use caution: this locking mechanism might not work correctly if the database file is kept on an NFS filesystem. This is because fcntl() file locking is broken on many NFS implementations. You should avoid putting SQLite database files on NFS if multiple processes might try to access the file at the same time.

I would really like to know if Google's persistent disk locks properly for writes. I research AWS's EFS and it does not have a proper locking system for supporting SQLite.

Best Answer

Update (30 Aug 2019)

Dqlite by Canonical provides a distributed, high-availability SQLite which may suit this use case. It's open-source under Apache 2.0, and written in C, so it could be a drop-in replacement for SQLite. See also discussion on HN for more context.


Earlier answer

It sounds like from your comments that MySQL and Google Cloud SQL are not an option due to your architecture requiring the use of a single SQLite file.

Also, per SQLite docs, NFS is not an option due to the locking issues.

Here are some other options to consider.

Alternative distributed filesystem

In addition to NFS, there are a number of other distributed filesystems you may want to evaluate, such as Ceph, GlusterFS, OrangeFS, ZFS, etc. In addition to your own research, consider reaching out to SQLite users or developers for guidance and past experiences.

Use NFS, but enforce single writer-at-a-time

The NFS issue appears to be about locking, which is only needed for writes: as long as you can guarantee that only one process has the database locked for writes at-a-time, several other processes can open it for reads so this should be OK (please confirm/verify that this is the case).

Thus, as long as there's an external method to ensuring a single-writer, you may be able to use NFS. Consider using a distributed lock service such as Apache ZooKeeper, HashiCorp Consul or CoreOS etcd for the lock service, and you can store your SQLite on NFS.

This, of course, relies on each process with direct access to SQLite database to properly close it when it no longer needs to write to it, so correctness is hard to enforce, as it relies on all software to be correct and cooperating.

Lightweight RPC server

You mentioned that your architecture (which cannot be changed at this time) relies on SQLite, but if it's possible to have them call an RPC service instead of opening the file directly, you can have that server be the only point of opening the SQLite database and avoid the locking issue from multiple concurrent users. However, this means that you would have to change all the client code to call the RPC service instead of opening the SQLite database directly, which is a non-trivial amount of work.

Conclusion

None of these options are trivial and will require work. The reason is that:

In contrast to many other database management systems, SQLite is not a client–server database engine. Rather, it is embedded into the end program.

As such, it is not the right solution for multiple accessors, and hence a bunch of workarounds are required.

Longer term, if you're in a situation where you would have to make a significant changes to continue to be able to maintain this system, you may want to consider migrating to MySQL or Google Cloud SQL instead of investing into workarounds to continue to use SQLite.