I suspect your replica VHDs are either constantly open with a write lock or too frequently written to be covered by the MinimumFileAgeDays setting (5 days by default, can be set as low as 0 with Set-Dedupvolume <Drive>: -MinimumFileAgeDays 0
).
By the way, the documentation clearly declares such a configuration "unsupported":
Unsupported configurations
Constantly open or changing files
Deduplication is not supported for files that are open and constantly
changing for extended periods of time or that have high I/O
requirements, for example, running virtual machines on a Hyper-V host,
live SQL Server databases, or active VDI sessions.
Deduplication can be set to process files that are 0 days old and the system will continue to function as expected, but it will not process files that are exclusively open. It is not a good use of server resources to deduplicate a file that is constantly being written to, or will be written to in the near future. If you adjust the default minimum file age setting to 0, test that deduplication is not constantly being undone by changes to the data.
Deduplication will not process files that are constantly and exclusively open for write operations. This means that you will not get any deduplication savings unless the file is closed when an optimization job attempts to process a file that meets your selected deduplication policy settings.
And thus also contains the following recommendation:
Not good candidates for deduplication:
- Hyper-V hosts
- VDI VHDs
- WSUS
- Servers running SQL Server or Exchange Server
- Files approaching or larger than, 1 TB in size
It looks a bit like what you are seeking for is online deduplication which dedupes data as it is being written to disk. This is a feature of some more sophisticated SAN solutions (including Nexenta's SMB-targeted offerings), but comes at a rather high cost for the silicon - you would need a powerful machine with a lot of RAM to have online dedup run smoothly.
Best Answer
If it uses the Microsoft defragmentation APIs it should be able to, as the deduplication chunks and metadata are stored as plain files on the disk. If you're paranoid about data loss, just disable the dedup jobs on the volume before running it. I asked Ran Kalach, part of the dedup team at Microsoft about this, and he stated that there were no known data integrity issues with 3rd party defragmentation programs which use the Microsoft defragmentation APIs. Although there could be performance issues due to large sparse files utilized by dedup.
I've been using MyDefrag because it is highly configurable and allows you to write scripts to determine file placement and other actions. The deduplication chunks and metadata are stored in
?:\System Volume Information\Dedup
. Security permissions on this directory are set to only allowNT AUTHORITY\SYSTEM
access, so if you want to be able to defragment these files you will need to run your defragmentation program under theNT AUTHORITY\SYSTEM
account. This can be acomplished with Microsoft/SysInternal's psexec program. Just runpsexec.exe -i -s -d C:\YourDefrag.exe
To address the comments in your question regarding defragmenting a deduplicated volume is of little use, I would have to disagree. To start off not all files and directories are always deduplicated. In a default configuration several file types are excluded, see the
ExcludeFolder
,ExcludeFileType
andExcludeFileTypeDefault
properties for theGet-DedupVolume
cmdlet. This can be further configured by the administrator, for instance I exclude .MKV video files because of the low duplication rates in my environment. Also files in excess of 1TB will not be deduplicated even in Server 2016, and files 32KB or smaller will not be deduplicated either. Secondly, free space fragmentation can decrease write performance, and can increase the chance future files are fragmented. Thirdly even if a deduplicated file is inherently fragmented, a fragmented deduplication chunk will further decrease performance. And finally by grouping dedup chunks together with a program like MyDefrag you can reduce the time it takes to perform garbage collection and scrubbing jobs by reducing the amount the amount of time spent as the disks are seeking.Also the data itself will not be rehydrated if defragmentation is ran as the user visible deduplicated files are stored as reparse points on disk - a special type of file similar to a junction or directory mount point.