Skipping hardlinks when using TSM Backup

backuphardlinktivolitsm

We need to backup a filesystem with lots of hardlinks. Since there are
several hardlinks for each "true" file, we would like to skip all the
hardlinks when backing up the filesystem to avoid n exact copies of
each file.

The backup is done using Tivoli Storage Manager Backup, and we've been
unable to get it to treat hardlinks as anything other than separate
files to be backed up alongside each other.

In case it's relevant for possible solutions, I'd like to note that
it's possible to tell a hardlink from a proper file by the filename:

 foobarbaz-123.ext    # file
 foobarbaz-123-1.ext  # hardlink
 foobarbaz-123-2.ext  # hardlink
 barbazfoo-456.ext    # file
 barbazfoo-456-1.ext  # hardlink
 barbazfoo-456-2.ext  # hardlink
 barbazfoo-456-3.ext  # hardlink

That is, all hardlinks have two hyphens in the filename, where as
proper files have just the one.

The server is running Ubuntu Linux, and the files are situated on
a gfs volume on our SAN.

Best Answer

A quick read of some TSM docs suggests "Don't do that!"

With unix, a "file" is just a directory entry that points to an inode. A "hard link" is just when you have more than one directory entries (pointers) pointing to a given inode. For all intents and purposes, these two "files" are exactly 100% identical.

Hard links are a well established and understood mechanism in unix. It is proper and common to encounter them and it is common for backup software to understand exactly what a hardlink is and to back it up exactly as it should -- as another pointer to a specific piece of data, not as a unique and novel piece of data that happens to be exactly the same as the other hard links.

A quick google of tsm and hardlinks indicates that tsm understands hard links and the docs specifically warn:

Problems can occur if you [back up|archive] only one file of a hard-linked pair. For example, files texta and textb contain a hard link to each other. You archive texta, and then edit textb and make changes. If you retrieve texta, the changes you made to textb are lost.

Interestingly, it seems like are two different ways that you can do backups with TSM -- backups and archives and the two ways seem to deal with hard links differently.

backing up and restoring files:

A hard link is established when two files point to the same data file. When you back up a file that contains a hard link to another file, TSM stores both the link information and the data file on the server. If you back up two files that contain a hard link to each other, TSM stores the same data file under both names, along with the link information.

archiving and restoring files:

When you archive a file that contains a hard link to another file, TSM stores both the link information and the data file on the server.

From this it seems that you'll blow your backup server up if it is "Archiving" things and it will do what you want if you're "backing up." Leave it to IBM to make it simple!

Related Topic