Windows – Slow disk performance on Azure Virtual Machine

azureperformancestoragewindows

Okay, so first of all, allow me to say that I'm not an operations person, but a developer. So I'm going into a bit of unknown land here, so please bear with me.

I would like to use a Azure Virtual Machine for extracting a 50 GB XML file from a 1.9 GB zip file. So I've been testing which instance size on Azure I should use to get good performance while not paying for more than I need.

However, the disk performance of Azure VM's have not been amazing, and I would like to know if it's me who's doing something wrong, or if my results are what can be expected.

First of all, what I have been testing with? I have a custom .NET console application that does nothing but take a zip file as an argument and immediately starts extracting the zip file to the same directory that the zip file resides in. While the extraction is going on, the application calculates how many megabytes the application have written to the target file per second and outputs it.

On my local development machine, I get pretty good performance with this application, 160-210 MB/s written. So the whole extraction process takes around 8 minutes. The specs of my local machine is Intel Core i7 950, 3 GHz, 4 cores (8 logical), 12 GB RAM, Samsung SSD 830 series 250 GB.

Okay, so I started testing different instance sizes, and here is my results.

  • On a A4 instance with Windows Server 2012 Datacenter R2 (8 cores, 14 GB RAM) with a striped RAID of 4 virtual disks using the same storage account, without host caching, I got steady 30-35 MB/s, which means the whole extraction took 24 minutes and 48 seconds. I also tried enabling host caching, but it didn't really make any difference.
  • On a D4 instance with Windows Server 2012 Datacenter (8 cores, 28 GB RAM, 500 GB local SSD disk) i got really good performance (150+ MB/s) for the first minutes, and then varying performance with peaks at 200 MB/s and valleys at 9 MB/s. Average performance was between 70 and 100 MB/s. The extraction took 9 min and 40 seconds.
  • On a D3 instance with Windows Server 2012 Datacenter (4 cores, 14 GB RAM, 250 GB local SSD disk) i got really good performance (150+ MB/s) the first minute, but then the performance declined to steady 20-40 MB/s, making the extraction process take 21 minutes and 49 seconds.

On a D2 and D1 instance, the disk performance is worse than on the D3.

And this really surprises me. How can a local SSD disk perform so badly, as it does on the D1, D2 and D3 instances? And does anyone know why the disk performance differs so much between D1 to D4? Is it a memory issue? When I look at the task manager while the extraction os going on, the memory usage is exploding. I suspect it's because Windows is caching the written data, but when it runs out of memory, it has to flush the data to disk. When this happens, the disk performance declines. But this doesn't happen on my local machine, so why is such aggresive caching necessary on these VM's?

I know there's differences between my local machine and a virtual machine hosted in Azure, but is the disk performance I'm experiencing really to be expected?

(I originally posted my question on Stackoverflow, as I suspected it was my application that was the cause. But I'm not so sure anymore)

Best Answer

Bit late to the party here, but for what it's worth "attached" SSDs in Azure have their IOPs throttled based on the machine size. It wasn't mentioned in the pricing anywhere but I raised this issue as a ticket with technical support when they referred me to the blog post below.

See this link: http://azure.microsoft.com/blog/2014/10/06/d-series-performance-expectations/

Related Topic