Database – Will B-Trees and Other Data Structures Become Obsolete With The Advent of Solid State Drives

data structuresdatabase

Many (perhaps most?) database applications today use B-Trees and variations to store data, because this data structure optimizes the read, write and seek operations on a hard disk (and these operations in turn play an important role in the overall efficiency of the databases).

Should Solid State Drives (SSDs) completely outplace traditional hard disks (HDDs), though, could we say that B-Trees and variations will become obsolete, giving room for data structures that are more efficient operating on direct access memory? If so, what will those structures be? (e.g., hash tables, AVL trees)

Best Answer

B-Trees are most often used for database indexes on hard disk, but they have advantages even as an in-memory data structure, given the modern memory heirarchy with multiple layers of cache and with virtual memory. Even if virtual memory is on an SSD, that won't change.

I use an in-memory B+-style multiway tree library that I wrote quite a lot in C++. It can have performance advantages - the reason it was originally written was to try to use cache better - but I have to admit it often doesn't work that way. The problem is the trade-off which means items have to move around within nodes on inserts and deletes, which doesn't happen for binary trees. Also, some of the low-level coding hacks I used to optimise it - well, they probably confuse and defeat the optimiser, truth told.

Anyway, even if your databases are stored on an SSD, that's still a block-oriented storage device, and there's still an advantage to using B-Trees and other multiway trees.

BUT about ten years ago, cache-oblivious algorithms and data structures were invented. These are oblivious to the size and structure of caches etc - they make (asymptotically) the best possible use of any memory heirarchy. B-Trees need to be "tuned" to a particular memory heirarchy to make the best use (though they work fairly well for quite a wide range of variation).

Cache oblivious data structures aren't often seen in the wild yet, if at all, but it time they may well make the usual in-memory binary trees obsolete. And they may also prove worthwhile for hard disks and SSDs as well, since they don't care what the cluster-size or hard-disk cache page size is.

Van Emde Boas layout is very important in cache-oblivious data structures.

The MIT OpenCourseware algorithms course includes some coverage of cache oblivious data structures.

Related Topic