Open-Source History – When Did .tar.gz Become the Standard for Linux Source Code Packaging?

historyopen sourcepackages

When browsing open-source projects that are primarily developed for Linux systems and downloading the latest packages, the source code is always stored in a .tar.gz or .tar.bz2 file.

Is there any reason for using .tar.gz or .tar.bz2 rather than something like .zip or .rar or some other compression algorithm (or even leaving it uncompressed if the project is small enough)?

Best Answer

To answer the question in the heading: tar.gz/tar.bz2 became the standard for distributing Linux source code a very very very long time ago, as in well over 2 decades, and probably a couple more. Significantly before Linux even came into existence.

In fact, tar stands for (t)ape (ar)chive. Think reel hard, and you'll get an idea how old it is. ba-dum-bump.

Before people had CD burners, distros of software were put out on 1.44Mb floppy disks. The compressed tar file was chopped into floppy-sized pieces by the split command, and these pieces were called tarballs. You'd join them back together with cat and extract the archive.

To answer the other question of why not Zip or Rar, that's an easy one. The tar archiver comes from Unix, while the other two come from MS-DOS/Windows. Tar handles unix file metadata (permissions, times, etc), while zip and rar did not until very recently (they stored MS-DOS file data). In fact, zip took a while before it started storing NTFS metadata (alternate streams, security descriptor, etc) properly.

Many of the compression algorithms in PKZip are proprietary to the original maker, and the final one added to the Dos/Windows versions was Deflate (RFC 1951) which performed a little better than Implode, the proprietary algo in there that produced the best general compression. Gzip uses the Deflate algorithm.

The RAR compression algorithm is proprietary, but there is a gratis open source implementation of the decompressor. Official releases of RAR and WinRAR from RARlab are not gratis.

Gzip uses the deflate algorithm, and so is no worse than PKZip. Bzip2 gets slightly better compression ratios.

TL;DR version:

tar.gz and tar.bz2 are from Unix, so Unix people use them. Zip and Rar are from the DOS/Windows world, so DOS/Windows people use them. tar has been the standard for bundling archives of stuff in *nix for several decades.

Related Topic