NetApp Snapshots – Can They Be Used as Backups?

backupnetappsnapshotstorage

Our shop relies very heavily on NetApp Volume Snapshots for backups. We use traditional agent-based tape backups for some of our data but by and large we rely on the Snapshots for the majority of our systems. Furthermore we do not have a rigorous change control policy or any centralized configuration management so all of our servers, regardless of whether the data their services provide is backed up, would need to be rebuilt from bare-metal (and without any real documentation). Naturally, this makes snapshots a very attractive proposition for management because we can just recover the entire server, user data and configuration included. We use NetApp's Virtual Storage Console for making snapshots of our NFS-based VMware datastores and NetApp's SnapDrive for raw device mapped (physical) LUNs that are presented directly to guests. We SnapMirror critical snapshots offsite to another Filer. Naturally we regularly test our restore process.

I can't help but feel uncomfortable with our reliance on snapshots on backups. To me, for a technology to considered a sufficient as a backup strategy it needs to meet the following criteria:

  • The backup needs to be atomic. That is to say, the backup cannot rely on anything else for its recovery.
  • The backup needs to be separated from the system it is a backup of (out of band).
  • The backup needs to be copied or transported to remote site (off site)

NetApp Snapshots

It is my understanding that NetApp Snapshots work under a Redirect-On-Write (RoW) methodology. The WAFL file layout uses a set of pointers (metadata?) that actually reference each block of storage where ever it might be. To make a snapshot, the system just takes a copy of a volume's metadata and stores it in that volume's reserved space. Any writes (creations/changes/deletions) are redirected to new blocks. This is supposed to be the special sauce that makes NetApp's WAFL so great because you don't have do the read and then write the old data to the reserved space and then write your new data over the old like Copy-On-Write snapshots.

I fully admit I might not understand exactly how NetApp Volume Snapshots work but if my understanding is more or less correct NetApp Snapshots fail to meet my criteria for backups.

  • They are not atomic. The "snapshot" is really just a set of pointers to the original data. If the original data is no longer there, the metadata is useless.
  • The snapshot is not separated from the system. If someone deletes the wrong volume I lose the snapshot. If the NetApp Filer explodes into tiny little kittens I lose the backup. I can use SnapMirror to move my snapshots to another Filer but again, it's just moving the metadata not the actual blocks. If I lose the original volume, I can't see how a snapshot copied to another Filer is going to help.

Can someone explain how NetApp Snapshots can be considered backups? I'm looking for Good Subjective answers so please support your position with facts, references and experience. If my understanding the underlying technology is incorrect, please explain where and why that changes my conclusion. If your shop relies on NetApp Snapshots as backups, please include enough contextual information so that people can get a sense of what kind of recovery policy you have to meet.

Best Answer

Backups serve two functions.

  • First and foremost, they're there to allow you to recover your data if it becomes unavailable. In this sense, snapshots are not backups. If you lose data on the filer (volume deletion, storage corruption, firmware error, etc.), all snapshots for that data are gone as well.
  • Secondly, and far more commonly, backups are used to correct for routine things like accidental deletions. In this use case, snapshots are backups. They're arguably one of the best ways to provide this kind of recovery, because they make the earlier versions of the data available directly to the users or their OS as a .snapshot hidden directory that they can directly read their file from.

No retention policy

That said, while we have snapshots and use them extensively, we still do nightly incrementals on Netbackup to tape or data domain. The reason is that snapshots can not reliably uphold a retention policy. If you tell users that they will be able to back up from a daily granularity for a week then a weekly granularity for a month, you can't keep that promise with snapshots.

On a Netapp volume with snapshots, deleted data contained in a snapshot occupies "snap reserve" space. If the volume isn't full and you've configured it this way, you can also push past that snapshot reserve and have snapshots that occupy some of the unused data space. If the volume fills up, though, all the snapshots but the ones supported by data in the reserved space will get deleted. Deletion of snapshots is determined only by available snapshot space, and if it needs to delete snapshots that are required for your retention policy, it will.

Consider this situation:

  • A full volume with regular snapshots and a 2 week retention requirement.
  • Assume half of the reserve in use for snapshots based on the normal rate of change.
  • Someone deletes a lot of data (more than the snapshot reserve), drastically increasing the rate of change, temporarily.

At this point, your snapshot reserve is completely used, as is as much of the data free space you've allowed OnTap to use for snapshots, but you haven't lost any snapshots yet. As soon as someone fills the volume back up with data, though, you'll lose all the snapshots contained in the data section, which will push your recovery point back to the time just after the large deletion.

Summary

Netapp snapshots don't cover you against real data loss. An errant deleted volume or data loss on the filer will require you to rebuild data.

They are a very simple and elegant way to allow for simple routine restores, but they aren't reliable enough that they replace a real backup solution. Most of the time, they'll make routine restores simple and painless, but when they're not available, you are exposed.