File Systems – Designing an Effective Directory Structure

directory-structurefile-systemshashing

I was looking at how file systems are designed and noticed that most places say that the directory hierarchy can be implemented using a hash table.

Could someone please explain me how using a hash table to store a directory structure works?

For example, what would happen if I add a file/ directory or move a directory, how does that affect the hash table?

Also, how are paths involved?

Best Answer

The easiest is to have a hash table per directory. To follow a pathname, just get the root hash table, query it for the first directory in the path. Then, if it's a directory, get the next hash table and query it with the next part, and so on until the last part.

Since hash tables are unordered structures, you would typically sort them in memory to list a whole directory. Also, the hash wouldn't help to match wildcards; you have to do a whole directory scan to see which names match a given pattern. Of course, an ordered structure (like a sorted list or a B*tree) only help if there's a constant prefix.

A different way (used by Mac's HFS system) is to use an ordered structure (a B*tree in HFS case) and index by directory/name. In HFS, there was a dirID/filename structure that served as the main key for a single B*tree. Once you had this file handle, a single query returned the directory entry, without having to traverse the whole pathname. To get a directory list, just read the range [dirID, dirID+1), the resulting interval comprised all the filenames stored in that directory, in binary lexicographical order.

Related Solutions

Password Security – Is It More Secure to Hash a Password Multiple Times?

This is more suited on security.stackexchange but...

The problem with

hash1(hash2(hash3(...hashn(pass+salt)+salt)+salt)...)+salt)

is that this is only as strong as the weakest hash function in the chain. For example if hashn (the innermost hash) gives a collision, the entire hash chain will give a collision (irrespective of what other hashes are in the chain).

A stronger chain would be

hash1(hash2(hash3(...hashn(pass + salt) + pass + salt) + pass + salt)...) + pass + salt)

Here we avoid the early collision problem and we essentially generate a salt that depends on the password for the final hash.

And if one step in the chain collides it doesn't matter because in the next step the password is used again and should give a different result for different passwords.

REST API Directory Structure – Using RewriteEngine Effectively

I think you are pretty much on the right track... It is hard to say if any particular way of structuring folders as "the best", but I can share what I have seen. In particular, the way ASP.NET MVC structures this sort of thing is as follows:

Models
Views
Controllers
{Miscellaneous other folders}

Models Contains the classes that represent your view models, or in the case of an API project, the data types that your API sends and receives; eg Person { Name, Rank, SerialNumber }.

Views Contains the files which generate your views; for an API project you don't really have views, you would just use some sort of JSON or XML serialization layer, so you probably don't need something like this.

Controllers Contains the classes that have your actions on them. This is pretty much what you are talking about doing, but it is worth nothing the separation between the data class (ie the Person object, with { Name, Rank, SerialNumber }) and the Person controller, which supports GET, PUT, POST, DELETE of a Person object.

Miscellaneous other folders would contain any other resources you need; also not all that necessary for a purely API project.

Best Answer

Related Solutions

Password Security – Is It More Secure to Hash a Password Multiple Times?

REST API Directory Structure – Using RewriteEngine Effectively

Related Topic