Slow iSCSI Speeds – Synology and VMware with 4-Way MPIO

bandwidthiscsimpiosynologyvmware-esxi

I am trying to achieve high iSCSI speeds between my ESX box, and the synology NAS. I am hoping to achieve a top speed of 300-400 Mb/s. But so far all i've reached is 150 – 170 MB/s.

The main test that I am using is to create a 20GB Virtual Disk, Think Eager Zeroed in the iSCSI SSD based datastore. (And variations of this.)

Some questions:

  1. I am assuming that creating this disk would be sequential writing?
  2. Synology never passes 30% / 40% CPU usage, and memory is almost used. I am assuming that the Synology is capable of writing at these speeds on an SSD, right?
  3. Also, is ESX able to max out the available bandwidth when creating a virtual disk over iSCSI?
  4. If using a benchmark tool, what would you recommend, and how can I be sure that I won't have the bottleneck on the data sending side? Can I install this tool in a VM in the SSD Datastore, and run it "against itself"?

This is my setup.

I have a Synology 1513+ with the following disks and configuration:

  1. 3 4TB WD disks (Unused)
  2. 1 Samsung EVO 860. (1 volume, no raid)
  3. 1 Samsung 256GB SATA III 3D NAND. (1 volume, no raid)
  4. 2 iSCSI targets, one per SSD. (8 total vmware iSCSI initiators connected)

Network config:

  1. Synology 4000 Mbps bond. MTU 1500, Full Duplex.

  2. Synology Dynamic Link Aggregation 802.3ad LACP.

  3. Cisco SG350 with link aggregation configured for the 4 Synology ports.

  4. Storage and iSCSI network is physically separated from the main network.

  5. CAT 6 cables.

VSphere:

  1. PowerEdge r610 (Xeon E5620 @ 2.40Ghz, 64 GB memory)
  2. Broadcom NetXtreme II BCM5709 1000Base-T (8 NICS)
  3. VSphere 5.5.0 1623387

VSphere config:

  1. 4 vSwitch, 1 NIC each for iSCSI.MTU 1500. Full Duplex.
  2. iSCSI Software initiator with the 4 vmkernel switches in the port group, all compliant and path status active.
  3. 2 iSCSI targets with 4 MPIO paths each. All active and round robin

So basically, 4 cables from the NAS go to the Cisco LAG, and 4 iSCSI from ESX go to regular ports on the switch.

Tests and configs I've performed:

  1. Setting MTU to 9000 on all vmswitches, vmkernel, synology and cisco. I have also tried other values like 2000 and 4000.
  2. Creation of 1 (and 2, 3 simultaneous ) virtual disks in 1/2 iSCSI targets to maximise the workload.
  3. Disabled / Enabled Header and Data Digest, Delayed Ack.

I've lost count of all the things that I have tried. I am not sure where my bottleneck is, or what have I configured wrongly. I have attached some screenshots.

Any help would be much appreciated!

iSCSI paths on ESX

Networking config on ESX

Example of the vmkernel config

iSCSI iniciator network configuration

Cisco LAG config 1

Cisco LAG config 2

Best Answer

  1. It might be accelerated with VAAI ZERO primitive (I can't tell exactly on your outdated vSphere version). But it's sequential write either way. I also depends how you created your iSCSI target. Newer DSM-s by default create Advanced LUNs that are created on top on file systems. Older versions by default used LVM disks directly and performed much worse.
  2. ~400MB/s should be achievable
  3. 400MB/s is not a problem, if target can provide the IO
  4. If you're looking at pure sequential throughput, then dd on Linux side or simple CrystalDiskMark on Windows will work.

LAGs and iSCSI usually don't mix. Disable bonding on Synology and configure as separate interfaces. Enable multi-initiator iSCSI on Synology. I don't have a Synology at hand unfortunately for exact instructions.

Configure vSphere like this.

  • vSphere initiator --> Synology target IP/port 1
  • vSphere initiator --> Synology target IP/port 2
  • vSphere initiator --> Synology target IP/port 3
  • vSphere initiator --> Synology target IP/port 4

Disable unneccessary paths (keep one vSphere source IP to one Synology IP), vSphere supports (not enforced) only 8 paths per target on iSCSI. I don't remember if you can limit target access per source on Synology side, likely not. Also you have already enough paths for reliability and any more will not help as you're likely bandwidth limited.

Change policy to a lower value, see here https://kb.vmware.com/s/article/2069356 Otherwise 1000 IOPS will get down one path until path change occurs.

Keep using jumbo frames. It's about 5% win on bandwidth alone and on gigabit you can easily become bandwidth starved.