LVM snapshots are meant to capture the filesystem in a frozen state. They are not meant to be a backup in and of themselves. They are, however, useful for obtaining backup images that are consistent because the frozen image cannot and will not change during the backup process. So while you won't use them directly to make long-term backups, they will be of great value in any backup process that you decide to use.
There are a few steps to implement a snapshot. The first is that a new logical volume has to be allocated. The purpose of this volume is to provide an area where deltas (changes) to the filesystem are recorded. This allows the original volume to continue on without disrupting any existing read/write access. The downside to this is that the snapshot area is of a finite size, which means on a system with busy writes, it can fill up rather quickly. For volumes that have significant write activity, you will want to increase the size of your snapshot to allow enough space for all changes to be recorded. If your snapshot overflows (fills up) both the snapshot will halt and be marked as unusable. Should this happen, you will want to release your snapshot so you can get the original volume back online. Once the release is complete, you'll be able to remount the volume as read/write and make the filesystem on it available.
The second thing that happens is that LVM now "swaps" the true purposes of the volumes in question. You would think that the newly allocated snapshot would be the place to look for any changes to the filesystem, after all, it's where all the writes are going to, right? No, it's the other way around. Filesystems are mounted to LVM volume names, so swapping out the name from underneath the rest of the system would be a no-no (because the snapshot uses a different name). So the solution here is simple: When you access the original volume name, it will continue to refer to the live (read/write) version of the volume you did the snapshot of. The snapshot volume you create will refer to the frozen (read-only) version of the volume you intend to back up. A little confusing at first, but it will make sense.
All of this happens in less than 2 seconds. The rest of the system doesn't even notice. Unless, of course, you don't release the snapshot before it overflows...
At some point you will want to release your snapshot to reclaim the space it occupies. Once the release is complete, the snapshot volume is released back into the volume, and the original remains.
I do not recommend pursuing this as a long-term backup strategy. You are still hosting data on the same physical drive that can fail, and recovery of your filesystem from a drive that has failed is no backup at all.
So, in a nutshell:
- Snapshots are good for assisting backups
- Snapshots are not, in and of themselves, a form of backup
- Snapshots do not last forever
- A full snapshot is not a good thing
- Snapshots need to be released at some point
- LVM is your friend, if you use it wisely.
If I understand thin provisioning correctly then it could really cause problems if you aren't monitoring your VMFS filesystems growth closely and allow your VMDKs to fill up your VMFS volumes. You've seen in your testing that thin provisioned disks tend to grow to fill their available space quickly and that they cannot reclaim space that may be free inside the OS.
The other option is creating sufficiently sized VMDK files to handle your current usage and expected spikes in growth and just add more VMDK files as your application data usage grows. New VMDK files can be added live to a VM, you just have to rescan (echo "- - -" > /sys/class/scsi_host/host?/scan). You can partition the new disk, add it to your LVM and extend the filesystem all live. This way you are always aware how much space is allocated to each of the VMs and you can't accidently run your VMFS out of space from inside a guest.
As far as whether to partition or not if the disk is only going to be used by LVM, I always partition. Partitioning the disk prevents any warnings about bogus partition tables from coming up when the machine boots and makes it clear that the disk is allocated. It's a bit of voodoo but I also make sure to start the partition at 64 to help make sure the partition and filesystem is block aligned with the underlying storage. It's hard to detect and categorize as you usually don't have something to easily compare against but if the OS filesystem isn't aligned properly with the underlying storage then you can end up with extra IOPS required to service requests which cross block boundaries on the underlying storage.
Best Answer
Cloning a thin volume is as simple as taking a snapshot of the to-be-cloned volume. When using thin volumes, snapshot and new volumes really are the same thing, with different default flags.
From the kernel docs:
So it is perfectly legal to snapshot a thinly-provisioned volume to create a CoW clone. From the man page: