“tar czf” versus “tar cf – | gzip”: are they different? (or how to improve a backup)

backupgziptar

I want to speed up my backup done with tar czf, the common way to do it. But day by day my backed up files grow so it becomes slower.

I was thinking to take advantage of the several cores available in my server and I was wondering if there is any difference between doing the backup with tar czf or piping tar to gzip: tar cf - | gzip

I guess that there isn't any difference, because the first spawns two processes (tar and gzip), in a similar way like piping it.

If there is not difference, do you know any good alternative to do this, without going incremental? I'm looking at pigz too and it looks fine.

Best Answer

When you say you want to take advantage of multiple cores the implication is that your tar with gzip is CPU bound and not IO bound, are you sure this is the case? If you are not sure you need to run sar, iostat, top, or check monitoring graphs etc to find out. Never a good idea to try to solve a problem with out understanding it first. Not saying this is the case with you for sure, but my guess would be that even though there is compression with gzip you would be more likely to be IO bound.

If it is IO bound, and you have multiple arrays, a separate process for each array might make sense.

I also second David's advice to consider incremental.

Related Topic