How to verify that a deduplication has taken place

deduplicationwindows-server-2012

Microsoft Windows Server 2012 and onwards offers a de-duplication service that periodically scans files, find identical chunks and removes excessive copies to save space.

To the user browsing the files, they should all look the same.

My problem is that I have a piece of software that's reading these files and failing, when it reads a file processed by de-duplication. I set up a windows server with de-duplication service to develop and test a fix for this but I am not sure if my test files are being deduplicated and if my fix is really working.

Is there something in the file metadata about any deduplication taken place? Or perhaps the de-duplication service has an accessible data base with the augmented files?

I have already tried the obvious: create a file, copy that file in the same folder and then view the properties of the folder – but the size of the folder amounts to both files, while I was expecting it to amount to the size of only one file.

Best Answer

Deduplication is implemented as a filter driver on top of NTFS (and now ReFS) and should work transparent. You can always disable it for some particular file sets if it causes issues.

To get deduplication status stick with Get-DedupeStatus cmdlet. See:

https://docs.microsoft.com/en-us/powershell/module/deduplication/get-dedupstatus

There's a way to visualize what's there. See:

https://www.foldersizes.com/features/windowsdeduplicationdiskspace

You can exclude particular files from deduplication jobs. See:

https://docs.microsoft.com/en-us/windows-server/storage/data-deduplication/advanced-settings

ExcludeFileType is what you should look at.

ExcludeFileType File types that are excluded from optimization Array of file extensions Some file types, particularly multimedia or files that are already compressed, do not benefit very much from being optimized. This setting allows you to configure which types are excluded.