Mixing Volume Shadow Copy and Data Deduplication in Windows Server

deduplicationvsswindows-server-2012windows-server-backup

I'm using Windows Server 2012 and I'm creating a scheduled backup for multiple Hyper-V guests (all of them are Windows Server 2012) to a dynamic VHDX on a removable USB drive (so that I can alternate two drives and have two sets of backups).

I realized that since I have a lot of similar data (many copies of OS files for each guest OS) on the drive, Data Deduplication could help. So I enabled it on the drive, and started the optimization job. It reported 8 GB of saving (on a 35GB drive I created for test, which contains backups for 3 VMs) but the VHDX file size grows 4 gigabytes.

After optimization of VHDX file (Full mode) I realized that not only deduplication didn't reduce the drive size at all, it has increased it.

The only explanation I have, is that since WSB (Windows Server Backup) has created Shadow Copies on the drive, it is possible that VSS and Deduplication can't play well. So, VSS is keeping track of changes made by Deduplication, keeping both duplicated and deduplicated versions on the volume.

  • Am I right in my conclusion?!
  • Is there a way to make deduplication work with VSS? Because backing up tens of VMs means a lot of duplicate data. It would be nice to make deduplication work!

Best Answer

You are right in your assumption that WSB has created shadow copies. It uses these copies to maintain a backup history.

If you still have backup versions (and thus shadow copies) of points in time before your dedup optimization job has run, you would not see any savings at all since the deduplicated blocks have not been freed - they are needed for an older, non-deduped version of the data which is still referenced by one of the shadow copies.

So the bottom line is that if you need deduplication savings to show, you need to remove all older shadow copies.

The increase you are seeing is probably not due to deduplication activity but simply due to the fact that additional backup jobs were run in the meantime and older shadow copies are not deleted unless necessary (i.e. the volume would not have enough space for the new backup otherwise)