I am using tar
to archive a group of very large (multi-GB) bz2
files.
If I use tar -tf file.tar
to list the files within the archive, this takes a very long time to complete (~10-15 minutes).
Likewise, cpio -t < file.cpio
takes just as long to complete, plus or minus a few seconds.
Accordingly, retrieving a file from an archive (via tar -xf file.tar myFileOfInterest.bz2
for example) is as slow.
Is there an archival method out there that keeps a readily available "catalog" with the archive, so that an individual file within the archive can be retrieved quickly?
For example, some kind of catalog that stores a pointer to a particular byte in the archive, as well as the size of the file to be retrieved (as well as any other filesystem-specific particulars).
Is there a tool (or argument to tar
or cpio
) that allows efficient retrieval of a file within the archive?
Best Answer
tar (and cpio and afio and pax and similar programs) are stream-oriented formats - they are intended to be streamed direct to a tape or piped into another process. while, in theory, it would be possible to add an index at the end of the file/stream, i don't know of any version that does (it would be a useful enhancement though)
it won't help with your existing tar or cpio archives, but there is another tool, dar ("disk archive"), that does create archive files that contain such an index and can give you fast direct access to individual files within the archive.
if dar isn't included with your unix/linux-dist, you can find it at:
http://dar.linux.free.fr/