Disk IO Performance Difference — Hyper-V / CSV — Guest vs Host

I have two different Hyper-V 2012 R2 environments that use iSCSI to connect to their virtual machine storage. While the environments are different (one is all 10 GB whereas the other is mixed 1 GB / 10 GB, one is using an SSD array in RAID 6 whereas the other is RAID 10 spread across two arrays), the odd behavior I am seeing is the same.

Bottom line is that when I run a disk i/o test directly on the host against the CSV, I get a particular value for average IOPS. However when I run the same test within the virtual machine against its "local" disk (the vhdx file that is stored on the CSV), I get a greatly reduced IOPS value.

To put things in perspective, here is the environment I am testing

Host
- Windows 2012 R2 Datacenter
- 512 GB
- 48 Logical processors
- 10 GB fiber for iSCSI traffic
- One (1) virtual machine running
Storage
- EqualLogic PS6210S
- 24 800GB SSD in RAID 6
- One (1) 1TB volume containing one (1) vm
- 10 GB fiber
- Host and Array are connect to dedicated network switches
Virtual Machine
- Windows 2012 R2 Datcenter
- 127 GB dynamic disk
- Dynamic RAM
I/O Test
- FIO 2.2.10 — Test Software
- 70/30 R/W mix against 500 MB test files (see below for actual test config file)

When I run the test against the CSV from the host (C:\ClusterStorage\VM-Infrastructure), I get read/write IOPS of about 22k/9k, respectively. However, when I run that same test within the VM against its C:\Temp folder (with the VM's VHDX file being stored on the array in C:\ClusterStorage\VM-Infrastructure') I get numbers of 13k/6k.

Is this a known problem? Are there any particular host/vm settings that I should be looking at to get the vm performance closer to what I get on the host? A drop from 22k read performance to 13k is pretty dramatic. I figured that there would be a slight hit within the vm but not this much — as high as 40% in some cases.

[global]
ioengine=windowsaio
directory=C\:\ClusterStorage\VM-Infrastructure
;directory=C\:\Temp
rw=randrw
rwmixread=70
;rwmixwrite=30
direct=1 ; 1 for direct IO, 0 for buffered IO
bs=8k
iodepth=32 ; For async io, allow 'x' ios in flight
invalidate=1 ; Invalidate page cache for file prior to doing io
numjobs=16 ; Create 'x' similar entries for this job
runtime=120
group_reporting ; ?
thread ; Use pthreads instead of forked jobs

[workload]
name=8k7030test
size=500m

Best Answer

After further research and some discussions with storage experts, the culprit has been found.

Even though the host was running a single virtual machine and that vm was the only client reading and writing the storage array, the built-in Hyper-V storage and networking load balancer was kicking in and throttling back the vm. When the load balancer was disabled, the virtual machine put up IOPS numbers very close to what we saw directly from the host.

For storage operations, the latency threshold value is 83 ms and 2 ms for networking. As best we can tell, the default threshold values are overly aggressive or just not suited to iSCSI storage connections. (iSCSI connections will of course add latency that you would not see with directly connected or local storage.) The registry setting that controls this (for storage) is HKLM\System\CurrentControlSet\Services\StorVsp\IOBalance\Enabled. Setting a value of 0 disables the balancer.

More information can be found at http://www.aidanfinn.com/?p=13232

We have not decided if we will keep the balancer turned off. Obviously it is there, and kicks in, for a reason. While it probably should not be on for a handful of virtual machines, when I start loading up the host it will be more beneficial. My main goal was understanding why my numbers were so disparate.

Best Answer

Related Solutions

Attach storage drive in a VM via iSCSI initiator or VHDX on the Hyper-V host

Is ReFS ready to host production VHDXs on Hyper-V 2012 r2 clusters

Ancillary

Related Topic