Is it a good idea to take snapshots of production servers at regular intervals. I am reading a lot of website that suggest that you should not take snapshots of production systems, stating that it can effect network and machine performance. Anybody have any insight on this?
VMWare Snapshots
snapshot
Related Solutions
How long have those snapshots been in place? Typically you don't want a snapshot around longer than a few days otherwise you're liable to run into issues. Best thing I can recommend is either committing those snapshots (might take awhile if they're large/have been running for awhile. Virtual Center might time out but its still deleting in the ESX if its really large). Snapshots are just delta files of a particular VM so there's no way of applying system-wide changes across multiple ones.
Update:
Why snapshots can stop machines for long time: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002836
VMware Admin guide (PDF): http://www.vmware.com/pdf/vi3_301_201_admin_guide.pdf
Horror story of "the long snapshot" which should be quite painful for you guys if you decide to commit: http://www.vmwarez.com/2006/11/beware-long-snapshot.html
Most of these snapshots are copy-on-write snapshots, which are really fast and really cheap (storage-wise) on rarely-updated systems. LVM snapshots are COW snapshots, ZFS/BTRFS both have a COW-mode for snapshots, reiserfs doesn't have snapshots natively, Novell's NSS file-system is also COW, as are Shadow Copy volumes for Windows NTFS volumes.
Copy-on-write snapshots take a copy of the metadata of the target volume into the snapshot pool. Then, depending on which mode of COW they're using, they copy data that would be overwritten by new writes to the snapshot pool before writing the new data.
ZFS and (eventually if not already there) BTRFS have full-snapshot capabilities, which is useful for snapping onto separate media, which in turn is very handy for sneakernet backup systems using removable media. ZFS doesn't call this a "snapshot" though, they leverage ZFS's ability to use zfs send
and zfs recv
to copy volumes and snapshots over the network to a remote host (or local array).
I prefer filesystem-level snapshot abilities over LVM ones because I better trust the filesystem itself to handle the process cleanly. However, in the lack of direct filesystem support, LVM should work just fine in most cases.
COW snapshots are good if you need a point-in-time backup taken really fast for short-term recovery needs. Such as doing a daily, or 4x daily, snap to be kept for a week. This is handy if you need to recover files users accidentally delete, or need to roll-back an entire system to a pre-update config. They can also be used by some backup systems as a fully quiesced filesystem, so backups taken from the snapshot volume don't have to worry about open files getting in the way. The key thing to remember is that the snapshot volumes will be on the same storage as the primary volume, so don't give you anything in case of array failure.
FULL snapshots are good if they're taken to removable or remote media of some kind. If you have networked storage, the target could be a different iSCSI or Fibre Channel array than the one the primary storage is hosted in. This gives you some off-array protection for some kinds of faults. If using removable media, such as a 3TB ESATA drive, you can even use it as a simple backup-to-disk system. These snapshots CAN be on different hardware than their COW brothers, so are useful for disaster-resilience.
On Full vs COW snapshots.
The term 'snapshot' has drifted a bit over the years. This year, I'm pretty sure it means "a Copy-On-Write copy of the original data using block-relocation". By this definition, the "Full" snapshot presented above is not actually a snapshot, it's replication. Some storage vendors have used different definitions of 'snapshot' in the past to describe various block-level operations they perform. Where it gets confusing are systems that use snapshots as part of the replication process.
Best Answer
Be aware that snapshots are not a replacement for backups! I use snapshots only for creating short-term failback points, e.g. before a server is patched with something that might break it. But I won't keep snapshots for longer than several days, and by no means I would create them on a regular basis, just because I can and might need them later.
Please note that most virtualization aware backup software also use snapshots in an automated way to bring a VM to a consistent read-only state before backing it up. But even then the snapshot is deleted again as soon as the backup job has finished.