There are many ways to copy disks, file systems or files. Generally, copying the file system gives you a good clone with the flexibility that it can be a slightly different size on the target system. With the target system running some sort of live Linux (knoppix, ubuntu live, etc.), booted from a cdrom, you can create the partitions on the disk using fdisk or your favorite partition application. Assuming you have an SSH server running on the source system, take an approach similar to this:
http://www.linuxfocus.org/English/March2005/article370.shtml
The magic is happening in this command:
ssh sourcePC 'dump -0 -f - /' | restore -r -f -
When using any particular method to clone disks on a live Linux environment, your only concern would likely be with the databases. The best way to backup and restore a database is to use their dump tool to make an ascii file snapshot of the database just prior to the file system dump. For mysql there is :
mysqldump --all-databases > mysql_databases.sql
For postgresql, there is:
pg_dumpall > pg_databases.sql
If you encounter any sort of consistancy error on the new system, restore the database. Alternately, once you have shut off services on the source system, do the DB dump again, and restore on the target, and you will not miss any recently modified data.
If I understand thin provisioning correctly then it could really cause problems if you aren't monitoring your VMFS filesystems growth closely and allow your VMDKs to fill up your VMFS volumes. You've seen in your testing that thin provisioned disks tend to grow to fill their available space quickly and that they cannot reclaim space that may be free inside the OS.
The other option is creating sufficiently sized VMDK files to handle your current usage and expected spikes in growth and just add more VMDK files as your application data usage grows. New VMDK files can be added live to a VM, you just have to rescan (echo "- - -" > /sys/class/scsi_host/host?/scan). You can partition the new disk, add it to your LVM and extend the filesystem all live. This way you are always aware how much space is allocated to each of the VMs and you can't accidently run your VMFS out of space from inside a guest.
As far as whether to partition or not if the disk is only going to be used by LVM, I always partition. Partitioning the disk prevents any warnings about bogus partition tables from coming up when the machine boots and makes it clear that the disk is allocated. It's a bit of voodoo but I also make sure to start the partition at 64 to help make sure the partition and filesystem is block aligned with the underlying storage. It's hard to detect and categorize as you usually don't have something to easily compare against but if the OS filesystem isn't aligned properly with the underlying storage then you can end up with extra IOPS required to service requests which cross block boundaries on the underlying storage.
Best Answer
The third option is to dump cloning and instead use a proper system configuration management tool such as Puppet or Chef. Cloning is a really bad idea for systems that you need to maintain over time, as you need to apply changes to all machines currently in the field, as well as respinning all of your clone masters. If you use a proper management tool, though, you just describe the state you wish a system to be in, and then the tool makes sure that the system is in that state -- whether it just came "factory fresh", or has been in production for several years and just needs to have a config file tweaked.
Basically, your new machine process should be: