Database – How to most efficiently store and serve 1,000,000+ small gziped files on a Linux web server

databasedisk-space-utilizationgzipperformance

I have large static content that I have to deliver via a Linux-based webserver. It is a set of over one million small, gzip files. 90% of the files are less than 1K and the remaining files are at most 50K. In the future, this could grow to over 10 million gzip files.

Should I put this content in a file structure or should I consider putting all this content in a database? If it is in a file structure, can I use large directories or should I consider smaller directories?

I was told a file structure would be faster for delivery, but on the other side, I know that the files will take a lot of space on the disk, since files blocks will be more than 1K.

What is the best strategy regarding delivery performance?

UPDATE

For the records, I have performed a test under Windows 7, with half-million files:

enter image description here

Best Answer

I would guess that a FS structure would be faster, but you will need a good directory structure to avoid having directories with a very large number of files.

I wouldn't worry too much about lost disk space. As an example, at 16K block size, you will loose 15GB of space in the worst case where you need one additional block for every single file. With todays disk sizes, that's nothing and you can adapt the parameters of your file system for your specific need.

Related Topic