Sql-server – Moving Images from Database to File System

sql server

So currently in our system we have been storing image files in the database (SQL Express 2005). Unfortunately it wasn't perceived that this would reach the max database size allowed by the SQL Express License. So I have proposed a plan of storing the images in the file system and only indexing where the file is in the database.

The plan is to save the root path in our OptionsTable as something like ImagesRoot and then only saving the actual imageID in the table, which is basically a FK from the PK of the record with the image. I have determined that it would be best to then split this down into sub-directories inside of the ImagesRoot based on every 1000 images so basically (ImagedID / 1000)\(ImageID % 1000) (e.g. ImageID is 1999 it would be in %ImageRoot%\1\999).

I'm looking for any potential pitfalls of this system and any thing that could be improved as I am already receiving some resistance from the owner of the company who wants everything to be in databases. Along those lines I would also take reasons why it should all be in databases.

I should mention we have in place already automated backups that run for all of our customers databases and any files that are generated by our program that are required to be saved over a period of time These are optional but if someone isn't using our system it is expected that they are using their own or data loss isn't our problem (it is if our system fails and they are using it!).

Thanks

Edit

Some people have suggested using 2008 R2 as a possible fix for this. It's a good idea I'm just going to have to look at issues involving legacy programs that use these databases. Not all of them use Stored Procedures and some aren't very maintenance friendly.

Best Answer

If you're going to break up the directory structure like that (and you should), you shouldn't use the method you propose, which will be prone to clustering. You should implement the same concept but based on a hashed value of the file name.

For instance, instead of what you propose, make a hash of the file name (I'll use md5), then created the subdirs based on the hash value:

Your proposal:
FILE=1999, H1=1, H2=999, FILEPATH=1\999\1999
FILE=1998, H1=1, H2=998, FILEPATH=1\998\1998
FILE=1997, H1=1, H2=997, FILEPATH=1\997\1997

Hashed solution:
FILE=1999, MD5=2554fe5cd0a1b3fb7f9ec112fd326744, H1=2, H2=54, FILEPATH=2\54\1999
FILE=1998, MD5=82ec15656dd2b8a3e50ff36643a713ad, H1=8, H2=2e, FILEPATH=8\2e\1998
FILE=1997, MD5=9cc2e1e538bd538014d294138a85e20b, H1=9, H2=cc, FILEPATH=9\cc\1997

The advantage to doing this is that it spreads the utilization of the folders more evenly, allowing you to do things like spread the folders across drives for performance, etc.

It's likely that your DB has hashing functions built in which would allow you to calculated the path on the fly, you you could also easily calculate it once in code and save the whole path.

My example of MD5 may not be the standard, (I think SHA1 is more common), but it was just to get a working example out there.