Database – How to most efficiently store and serve 1,000,000+ small gziped files on a Linux web server

databasedisk-space-utilizationgzipperformance

I have large static content that I have to deliver via a Linux-based webserver. It is a set of over one million small, gzip files. 90% of the files are less than 1K and the remaining files are at most 50K. In the future, this could grow to over 10 million gzip files.

Should I put this content in a file structure or should I consider putting all this content in a database? If it is in a file structure, can I use large directories or should I consider smaller directories?

I was told a file structure would be faster for delivery, but on the other side, I know that the files will take a lot of space on the disk, since files blocks will be more than 1K.

What is the best strategy regarding delivery performance?

UPDATE

For the records, I have performed a test under Windows 7, with half-million files:

enter image description here

Best Answer

I would guess that a FS structure would be faster, but you will need a good directory structure to avoid having directories with a very large number of files.

I wouldn't worry too much about lost disk space. As an example, at 16K block size, you will loose 15GB of space in the worst case where you need one additional block for every single file. With todays disk sizes, that's nothing and you can adapt the parameters of your file system for your specific need.

Related Solutions

Apache: Send pre-packed gzip’ed files

A solution for sending the correct version to browsers that don't accept gzip would be something along the lines of:

RewriteCond %{HTTP:Accept-Encoding} !gzip
...your rules here...

Also, there is another way to change the type, namely:

<FilesMatch .*\.css.gz>
    ForceType text/css
</FilesMatch>

<FilesMatch .*\.js.gz>
    ForceType text/javascript
</FilesMatch>

HTH.

Windows – Cleaning up a server’s C:\ drive. All known ways

Lastly, it would be nice to get an idea of what drive size you use for c:\ for whatever version of Windows you use.

Server 2003: We use 15GB C: drives for these now. We used to use 10GB ones, but the patch-dirs ate us out of house and home. We're not spinning up many of these any more, but if we do, 15GB is it.

Server 2008 & 2008R2: Microsoft itself is saying that 30GB is the number you should be aiming at. Seeing as how they made the patch-dirs nigh undeleteable with these server versions, I'm not going to doubt them. Currently we make our C: drives 20GB, but that's because we made our VM templates before this guidance emerged. We need to change it. 40 is probably better once you factor in 3rd party installers that resolutely stash things on C: no matter what you tell them.

Best Answer

Related Solutions

Apache: Send pre-packed gzip’ed files

Windows – Cleaning up a server’s C:\ drive. All known ways

Related Topic