I'm currently trying to use data deduplication on two seperate Windows Server 2012 Datacenter edition hyper-v hosts. On one, I am trying to dedupe replicas that are still being resync'd every 5 minutes or so. On the other, I have stopped the resync with a powershell script on about 15 servers (4 terabytes of data) and moved them to the root of the volume that I have deuplication enabled on.
Now for some reason, it works with anything I put in there except Replica VHD images. It just skips them.
I put 50 gigs of templates and isos and it worked great, I initiate the deduplication like so:
Start-DedupeJob -Full -Path R: -Type Optimization
It works great normally but the actual reason I'm using it in the first place is to reduce the space required to store a snapshot of the replica VHD. I would prefer to be able to have the hyper-v host resync the VHDs and have the deduplication going but if I have to remove the sync and then dedupe and then unoptimize to resync or something that is fine with me, I can just script it out, but right now under no circumstances can I get these to dedupe the replica vhds!!! It's driving me crazy!
Any advice, suggestions, would be greatly appreciated.
UPDATE:
I have two VHDs, one is from a template and the other is a replica image of a 1.6 terabyte data drive on another vm on another hyper-v server host.
I've matched all the file properties and permissions to be identical including ownership. The only thing is the file that does work with deduping is flagged as Attribute APL and the one that is not doing it is just Attribute A – I am not sure what P and L are and I don't believe I can set it with attrib.exe.
So crazy – no replica vhds will dedupe what so ever!
UPDATE:
The script I am using to optimize the vhds is
$vhds = Get-ChildItem -Recurse | ? {$_.extension -match "vhd"}
foreach ($vhd in $vhds) {
Mount-VHD -Path $vhd.fullname -Verbose -ReadOnly
Optimize-VHD -path $vhd.fullname -Verbose -Mode Retrim
Dismount-VHD -path $vhd.fullname -Verbose
}
I have ran that and noticed it is taking a little longer for the dedupe process to finish but there is still no deduplication going on with the Replication VHDs. This is very strange to me – I was hoping if something was flagging the file as 'open' it was not do so anymore after the optimize-vhd runs. The VHDs in question have not been written to for awhile now. I used this script to turn off resync on the host to stop the writes:
$vmlist = get-vm * | where {$_.replicationstate -eq "replicating" -and $_.state -eq "Running"}
foreach ($vm in $vmlist) {
$vmname = $vm.name
set-vmreplication -vmname $vmname -AutoResynchronizeEnabled $false
}
Best Answer
I suspect your replica VHDs are either constantly open with a write lock or too frequently written to be covered by the MinimumFileAgeDays setting (5 days by default, can be set as low as 0 with
Set-Dedupvolume <Drive>: -MinimumFileAgeDays 0
).By the way, the documentation clearly declares such a configuration "unsupported":
And thus also contains the following recommendation:
It looks a bit like what you are seeking for is online deduplication which dedupes data as it is being written to disk. This is a feature of some more sophisticated SAN solutions (including Nexenta's SMB-targeted offerings), but comes at a rather high cost for the silicon - you would need a powerful machine with a lot of RAM to have online dedup run smoothly.