How would one build a relational database on a key-value store, a-la Berkeley DB’s SQL interface

nosqlrelational-databasesql

I've been checking out Berkeley DB and was impressed to find that it supported a SQL interface that is "nearly identical" to SQLite.

http://docs.oracle.com/cd/E17076_02/html/bdb-sql/dbsqlbasics.html#identicalusage

I'm very curious, at a high-level, how this kind of interface might have been architected. For instance:

  • since values are "transparent", how do you efficiently query and sort by value
  • how are limits and offsets performed efficiently on large result sets
  • how would the keys be structured and serialized for good average-case performance

Best Answer

Berkeley DBs are not that distant from SQL databases. For instance, ISAM systems (which are similar to Berkeley DB) have been used to build relational databases (for instance, MySQL's MyISAM storage backend, warts et al.).

Basically, a Berkeley DB can store table rows. You can use Berkeley DB indexing to implement relational indexes on the stored rows- limit and offsets are implemented easily on top of that. Serialization is not a great concern (it is insignificant compared to IO).

The big difficulty is implementing joins and a decent query planner- the query planner is a crucial part of a relational database- it analyzes a query and decides how should it be executed; which tables need to be queried, in which order and using what indexes- and then the query needs to be executed in an efficient fashion (a naive approach will probably choke on the first join on two large tables- the naive approach is to materialize the full outer join, and that can have an untractable amount of rows). It doesn't sound like something very difficult, but there are lots of combinations and finding the best combination is difficult- and the different between the best combination and a "decent" one can be enormous.

Related Topic