How to setup a lacp bond for multiple network connections in XenServer

networkingxenxenserver

We're currently looking to use XenServer with Essentials for XenServer Enterprise together with a iSCSI SAN. What we're looking to do is to bond 4 gigabit ethernet connections to provide a large link to the iSCSI dedicated network.

From reading around it seems that the bonding within XenServer isn't great when used for a storage network (only providing balancing across upto two network connections). Is anyone using LACP successfully with XenServer, or some other method to bond more than two connections?

Best Answer

Citrix recommend not managing your storage networks from Xenserver. The Xenserver manual discusses how to mark physical interfaces as not managed, and once you've done that, you can do whatever you like in terms of configuring bonding on interfaces - and at this point, it's just the same as configuring an LACP bond on 4 ports on any standard Linux system

Note that doing this means you cannot allocate these NICs to any VMs. You'll need additional NICs in the system for management of Xenserver, and for server traffic.

It's also worth pointing out that, unless it was fixed recently, the management interface for Xenserver cannot be part of a bond.

Related Solutions

Setup redundant iSCSI network with 2 switches, SAN and ESX

Nicely structured approach and you're asking all the right questions. Your suggested redesign is excellent.

ESX 3.5 doesn't really do iSCSI Software Initiator multipathing but it will happily failover to another active or standby uplink on the vSwitch if a link fails for any reason . The VI3.5 iSCSI SAN Configuration Guide has some information on this, not as much as I'd like but it is clear enough. You shouldn't have to do anything on the ESX side when you change over but you will no longer get any link aggregation effects (because your uplinks are going to two separated non-stacked switches), only failover. Given the weakness of multipathing in the ESX 3.5 iSCSI stack this probably wont have any material effect but it might because you have multiple iSCSI targets so bear it in mind. I'm sure you know this already but Jumbo frames are not supported with the Software Initiator on ESX 3.5 so that's not going to do anything for you until you move to ESX 4.

In setting up the ESX vSwitch and VMkernel ports for iSCSI with ESX4 the recommendation is to create multiple VMkernel ports with a 1:1 mapping to uplink phyiscal NICs. If you want to create multiple vSwitches for this you can or you can use the NIC teaming options at the port level so that you have a single NIC designated as active per VMkernel port with 1 or more as standby. Once you have the ports\vSwitch configured you then need to bind the ports to the iSCSI multipath stack and it will then handle both multipathing and failover more efficiently. Given the way this works there is no need to worry about teaming across the switches, the multipath driver is doing the work at the ip-layer. This is just a quick idea of how it works, it is described in very good detail in the VI 4 iSCSI SAN Configuration Guide. That will explain everything you need to do, including how to set up Jumbo frame support properly.

As far as the stacking is concerned I don't think you need or want to do it for this config, in fact Dell's recommended design for MD3000i iSCSI environments is not to stack the switches as far as I can recall, for precisely the reason you mention. For other iSCSI solutions (Equallogic) high bandwidth links between arrays is required so stacking is recommended by Dell but I've never had a satisfactory explanation of what happens when the master fails. I'm pretty sure the outage during the new master election will be shorter than the iSCSI timeouts so VM's shouldn't fail but its not something I'm comfortable with and things will definitely stall for an uncomfortable period of time.

Networking – How Does Layer 3 LACP Destination Address Hashing Work?

What you're looking for is commonly called a "transmit hash policy" or "transmit hash algorithm". It controls the selection of a port from a group of aggregate ports with which to transmit a frame.

Getting my hands on the 802.3ad standard has proven difficult because I'm not willing to spend money on it. Having said that, I've been able to glean some information from a semi-official source that sheds some light on what you're looking for. Per this presentation from the 2007 Ottawa, ON, CA IEEE High Speed Study Group meeting the 802.3ad standard does not mandate particular algorithms for the "frame distributor":

This standard does not mandate any particular distribution algorithm(s); however, any distribution algorithm shall ensure that, when frames are received by a Frame Collector as specified in 43.2.3, the algorithm shall not cause a) Mis-ordering of frames that are part of any given conversation, or b) Duplication of frames. The above requirement to maintain frame ordering is met by ensuring that all frames that compose a given conversation are transmitted on a single link in the order that they are generated by the MAC Client; hence, this requirement does not involve the addition (or modification) of any information to the MAC frame, nor any buffering or processing on the part of the corresponding Frame Collector in order to re-order frames.

So, whatever algorithm a switch / NIC driver uses to distribute transmitted frames must adhere to the requirements as stated in that presentation (which, presumably, was quoting from the standard). There is no particular algorithm specified, only a compliant behavior defined.

Even though there's no algorithm specified, we can look at a particular implementation to get a feel for how such an algorithm might work. The Linux kernel "bonding" driver, for example, has an 802.3ad-compliant transmit hash policy that applies the function (see bonding.txt in the Documentation\networking directory of the kernel source):

Destination Port = ((<source IP> XOR <dest IP>) AND 0xFFFF) 
    XOR (<source MAC> XOR <destination MAC>)) MOD <ports in aggregate group>

This causes both the source and destination IP addresses, as well as the source and destination MAC addresses, to influence the port selection.

The destination IP address used in this type of hashing would be the address that's present in the frame. Take a second to think about that. The router's IP address, in an Ethernet frame header away from your server to the Internet, isn't encapsulated anywhere in such a frame. The router's MAC address is present in the header of such a frame, but the router's IP address isn't. The destination IP address encapsulated in the frame's payload will be the address of the Internet client making the request to your server.

A transmit hash policy that takes into account both source and destination IP addresses, assuming you have a widely varied pool of clients, should do pretty well for you. In general, more widely varied source and/or destination IP addresses in the traffic flowing across such an aggregated infrastructure will result in more efficient aggregation when a layer 3-based transmit hash policy is used.

Your diagrams show requests coming directly to the servers from the Internet, but it's worth pointing out what a proxy might do to the situation. If you're proxying client requests to your servers then, as chris speaks about in his answer then you may cause bottlenecks. If that proxy is making the request from its own source IP address, instead of from the Internet client's IP address, you'll have fewer possible "flows" in a strictly layer 3-based transmit hash policy.

A transmit hash policy could also take layer 4 information (TCP / UDP port numbers) into account, too, so long as it kept with the requirements in the 802.3ad standard. Such an algorithm is in the Linux kernel, as you reference in your question. Beware that the the documentation for that algorithm warns that, due to fragmentation, traffic may not necessarily flow along the same path and, as such, the algorithm isn't strictly 802.3ad-compliant.

Best Answer

Related Solutions

Setup redundant iSCSI network with 2 switches, SAN and ESX

Networking – How Does Layer 3 LACP Destination Address Hashing Work?

Related Topic