10Gbps iSCSI network transferring at only 2Gbps

10gbethernetfile-transferiscsistorage-area-network

I have a performance issue with my SAN. I have a full 10Gbps private network setup. I have the following setup

1 – Mac with Connected with via Fiber Transciever 10GbE
1 – Windows 7 PC with 10GbE
1 – HP Switch with All ports are 10GbE
2 – Quantum StorNext Servers with 10GbE
1 – Dell Compellent Solution with 2 Controllers connected to the network at 10GbE
All servers, switches, and computers have staticly assigned IPs

Compellent Config
Tier 1 – 10K SAS Drives in RAID 10
Tier 2 – 7200 SAS Drives in RAID6 (12 groups of 13 Drives in Each)

Additional info
Windows 7 Client TCP Offload Options
IPv4 Checksum Offload – Enabled
TCP Checksum Offload – Enabled
UDP Checksum Offload – Enabled
Large Send Offload – Enable
Jumbo Packet – 9014Bytes Enabled

I mounted the StorNext volume on my Windows 7 PC and my Lion Workstation. Unfortunately all my transfer speeds are at around 2Gbps or 2.8Gbps (if I'm really lucky). I was looking to get at least 5Gbps speeds out of this setup, but I'm averaging about 2Gbps or a little above 250MBps transfer rates on file copying. When I map a LUN directly to the boxes and then format it natively with either HFS (with journaling) on the MAC or NTFS on the Windows 7 PC, and then copy a file, I get about 180MBps. So my performance on a directly mapped LUN is slower than my StorNext Volume. Any suggestions? Has anyone seen degraded performance on iSCSI with 10GbE? Any help would be Awesome! Thanks!

Best Answer

1.) Jumbos -might- help if you're seeing a lot of processor load for interrupt traffic but if TCP is operating correctly it should be able to ramp well past 2G on a 10G link. I've seen plenty of 10GE links running above 90% without jumbos enabled.

2.) If you do use jumbos, enable the same size on every NIC and every switchport in the VLAN and/or broadcast domain. PMTU works when packets cross routers and mixing MTU values within the same network will lead to nothing but misery.

3.) I'm not particularly familiar with the Procurve gear but TCP traffic can be tricky at high speeds if there are any questions about buffer availability. I've seen other testing where this has manifested (without apparent TCP drops) as a huge cut in performance that ended up being fixed by actually reducing buffer sizes.

4.) Make sure that the actual TCP settings (1323, SACK, etc) are all configured consistently. The operating systems in question should be fine out of the box but I don't know much about the storage node. It might be worth digging into - either in terms of settings on the device or via a protocol trace (wireshark or tcpdump) to observe window sizing and any retransmissions going on.

5.) Try eliminating as many variables as you can - even getting down to a crossover cable between one of your storage nodes and a single workstation - to further isolate the issue. Don't be afraid to disable some of the offloads you mentioned as well, as they've been known to cause issues from time to time.

Related Topic