Backups serve two functions.
- First and foremost, they're there to allow you to recover your data if it becomes unavailable. In this sense, snapshots are not backups. If you lose data on the filer (volume deletion, storage corruption, firmware error, etc.), all snapshots for that data are gone as well.
- Secondly, and far more commonly, backups are used to correct for routine things like accidental deletions. In this use case, snapshots are backups. They're arguably one of the best ways to provide this kind of recovery, because they make the earlier versions of the data available directly to the users or their OS as a .snapshot hidden directory that they can directly read their file from.
No retention policy
That said, while we have snapshots and use them extensively, we still do nightly incrementals on Netbackup to tape or data domain. The reason is that snapshots can not reliably uphold a retention policy. If you tell users that they will be able to back up from a daily granularity for a week then a weekly granularity for a month, you can't keep that promise with snapshots.
On a Netapp volume with snapshots, deleted data contained in a snapshot occupies "snap reserve" space. If the volume isn't full and you've configured it this way, you can also push past that snapshot reserve and have snapshots that occupy some of the unused data space. If the volume fills up, though, all the snapshots but the ones supported by data in the reserved space will get deleted. Deletion of snapshots is determined only by available snapshot space, and if it needs to delete snapshots that are required for your retention policy, it will.
Consider this situation:
- A full volume with regular snapshots and a 2 week retention requirement.
- Assume half of the reserve in use for snapshots based on the normal rate of change.
- Someone deletes a lot of data (more than the snapshot reserve), drastically increasing the rate of change, temporarily.
At this point, your snapshot reserve is completely used, as is as much of the data free space you've allowed OnTap to use for snapshots, but you haven't lost any snapshots yet. As soon as someone fills the volume back up with data, though, you'll lose all the snapshots contained in the data section, which will push your recovery point back to the time just after the large deletion.
Summary
Netapp snapshots don't cover you against real data loss. An errant deleted volume or data loss on the filer will require you to rebuild data.
They are a very simple and elegant way to allow for simple routine restores, but they aren't reliable enough that they replace a real backup solution. Most of the time, they'll make routine restores simple and painless, but when they're not available, you are exposed.
quota resize
(or the equivalent action in one of the GUIs) forces the quota service to scan the volume and apply any changes made to /etc/quotas. It's not actually resizing your volume. Until you execute a quota resize (or alternatively disable and reenable the quota service), no changes made to the quota definitions will be applied. Also notable: during the time it takes to perform that scan you kicked off with your resize, quotas will not be enforced. This is a good reason to use individual volumes for network shares instead of qtrees.
Another good reason to use volumes instead of qtrees is that, depending on how you back this data up, it can take a good deal longer to back up a single 10TB volume containing 10 1TB qtrees than it would to back up 10 1TB volumes.
I don't know about you guys, but the reason we went with qtrees inside volumes was so we could overprovision shares without having to overprovision the aggregate. A failure to keep adequate space in the CIFS volume wouldn't cause other volumes to die, like a failure to keep adequate space in an aggregate would.
If you need to put someone to sleep or want a good reference for the quota command line, check this out.
Best Answer
Nowadays the whole dedicating limiting volume sizes seems to be going to the wayside as there is always eventually a case where you run out of size due to unforeseen changes in how things are stored or what is stored, so it's easier from the management point of view to just create huge arrays and let them grow and extend them as needed.
Things that were typical worries addressed by physically limiting volumes...swap space...log files spilling over and crashing the system...etc. seem to be getting addressed by cheaper hard disks, filesystems that are no longer limited to smaller sizes than available drive sizes, and volume management that lets you dynamically resize and add/subtract disks as needed. Oh, and I remember having to worry about drive failures too, but now these virtual volumes can be masking RAID volumes under the filesystems. Performance is more a matter of drive spindles and the application of the server (heavy writes? Heavier on reads? Need equal performance?) than a simple stick in a drive and share it out solution.
The only real drawback we've seen is that large volumes can take a really long time to perform disk checks on, but it's normally not a huge problem with journaling filesystems in use.
Our normal routine is to create a system partition for the OS, then throw everything else into a giant data partition for shares, home directories, etc. Usually by the time we outgrow it (performance or space reasons) we need to replace the server. Other admins on the site with experience dealing in larger multi-terabyte SANS and such might have other experiences to share.