Compression – Choosing the Best Archive/Compression Format

archivecompressionformat

Zip, Rar, 7z, Gzip, BZip2, Tar etc. I'm hearing 7z is the flavor of the month, why? Is it best for all situations or are there better choices for specific situations.

Or maybe the actual file archiver ie WinZip, WinRar, 7Zip etc (as opposed the format) has a bigger effect?

In your answer could you describe what sort of speed/compression tradeoff your mentioned format uses.

Please provide links to any empirical tests that back up your answer.

Background: I need to backup a custom search index that creates about 3000 relatively small files (less then 10MB), each containing a lot of repetitive data.

(As usual Wikipedia has a relevant article but the section on performance comparison is brief.)

Thanks

Best Answer

Compress, Gzip, Bzip, Bzip2 are not for archiving multiple files. They only compress single file. For archiving they are usually used with TAR. The problem with TAR is that it has no index table. It's only good if you're planning to restore the whole thing. If you're expecting that you ever need to restore only limited number of selected files, forget about TAR. To get the last file from tar.gz or tar.bz2 archive, you have to decompress and process all of it. In the case of zip, rar or 7-zip, it'll go to the index table, skip to relevant position of the archive and only process relevant files.

Ok, TAR's out, so that leaves you with ZIP, RAR and 7-ZIP. Of these three, ZIP is the most proliferated, most anything supports it, many applications have built-in support. And it's fast. On the other hand 7-ZIP is also portable, the library is LGPL, and has compression rates much better then other two, comes as a cost of being more CPU consuming. RAR is real loser there, neither great compression, nor really portable, nor fast.

EDIT: seems that the best option would be 7-ZIP, but with bzip2 compression method. This way you won't have the disadvantages of TAR, but you'll can still take advantage of bzip2 multi-core support. See this article.