File Systems – Designing an Effective Directory Structure

directory-structurefile-systemshashing

I was looking at how file systems are designed and noticed that most places say that the directory hierarchy can be implemented using a hash table.

Could someone please explain me how using a hash table to store a directory structure works?

For example, what would happen if I add a file/ directory or move a directory, how does that affect the hash table?

Also, how are paths involved?

Best Answer

The easiest is to have a hash table per directory. To follow a pathname, just get the root hash table, query it for the first directory in the path. Then, if it's a directory, get the next hash table and query it with the next part, and so on until the last part.

Since hash tables are unordered structures, you would typically sort them in memory to list a whole directory. Also, the hash wouldn't help to match wildcards; you have to do a whole directory scan to see which names match a given pattern. Of course, an ordered structure (like a sorted list or a B*tree) only help if there's a constant prefix.

A different way (used by Mac's HFS system) is to use an ordered structure (a B*tree in HFS case) and index by directory/name. In HFS, there was a dirID/filename structure that served as the main key for a single B*tree. Once you had this file handle, a single query returned the directory entry, without having to traverse the whole pathname. To get a directory list, just read the range [dirID, dirID+1), the resulting interval comprised all the filenames stored in that directory, in binary lexicographical order.

Related Topic