LACP vs 802.3ad

lacpsolaris

I’m looking to create an aggregation on a Solaris box using dladm. I understand that once the aggregation is created, 802.3ad will be used to balance the load depending on the policy (L2, L3 or L4). The only requirements are that the interfaces are connected to a single switch that supports 802.3ad and the interfaces are running at the same speed / full duplex. There are a few questions I’m hoping someone will comment on:

By default LACP is disabled on each aggregation. What’s the benefit of enabling LACP? Wouldn’t I already be load balancing with 802.3ad and the default L4 policy, which, as I understand, selects the outbound interface based on a hash of the source and destination ports. Reading wikipedia, there appears to only be two benefits of LACP (1) failover and (2) automatic configuration. Doesn’t 802.3ad already support failover? If a link goes down, the switch will still try to transmit packets to that interface? It’s hard to believe that’s true. And in terms of automatic configuration, I’m not certain what needs to be configured on the switch. For 802.3ad, I assume the switch just needs to know which load balancing policy (L2, L3 or L4) to use for sending packets to the aggregation. Am I missing something? What’s the advantage of LACP over 802.3ad?
I was reading online that NFS uses two connections between server/client: one for data and one for metadata, and that the typical transmission for packets in an aggregation is round-robin resulting in all the data traffic going over one interface with metadata on the other interface (assuming a two port aggregation). This seems to go against what I read about 802.3ad’s load balancing policy. If L4 is being used (Solaris dladm default), the outgoing interface is going to be based on the source and destination port, and assuming the switch is also using L4 the incoming interface will also be based on src/dst port. Am I wrong? BTW, does a layer 2 switch really look at the src/dst port? It seems resource intensive for a switch to pull the packet apart to calculate the hash and then reassemble. I also wouldn’t expect the outgoing and incoming interface to be used for the same src/dst hash i.e. perhaps the hash algorithm used by the host is different from the switch or they count ports from different ends. For this reason, I’m confused why a single stream would be limited to the max throughput of a single interface – if incoming and outgoing transmission may be on different interfaces.

I apologize for the fragmented post. I’m trying to get an understanding of the technologies and I haven’t been able to find a good tutorial or article on how these protocols are actually implemented. I see a lot of articles grouping 802.3ad and LACP as one and the same. Any comments will be appreciated.

Thanks!

Best Answer

IEEE 802.3ad is the standard for link aggregation, not withstanding the move of link aggregation standards to the 802.1 group, as 802.1ax.

The real advantage of LACP is the LACPDUs that transit the link from the switch to the host. These ensure that both sides of the link are capable of LACP. A secondary advantage is that with LACP, both the host and the switch think of all aggregated ports as a single port, allowing full use of all paths, as opposed to host-side LAG, where the switch still sees multiple ports, and all packets to the host traverse a single path, and only outbound packets from the host are load balanced across links.

If you're using a switch vendor that supports MLAG, or multi-chassis link aggregation, then you can use LACP to bond multiple links connected to multiple switches. This permits a great deal of resiliencey, while easing manageability and optimizing throughput.

But basically, if your switch supports LACP, use LACP. If your switch doesn't support LACP, then use non-LACP aggregation.

Related Solutions

Networking – How Does Layer 3 LACP Destination Address Hashing Work?

What you're looking for is commonly called a "transmit hash policy" or "transmit hash algorithm". It controls the selection of a port from a group of aggregate ports with which to transmit a frame.

Getting my hands on the 802.3ad standard has proven difficult because I'm not willing to spend money on it. Having said that, I've been able to glean some information from a semi-official source that sheds some light on what you're looking for. Per this presentation from the 2007 Ottawa, ON, CA IEEE High Speed Study Group meeting the 802.3ad standard does not mandate particular algorithms for the "frame distributor":

This standard does not mandate any particular distribution algorithm(s); however, any distribution algorithm shall ensure that, when frames are received by a Frame Collector as specified in 43.2.3, the algorithm shall not cause a) Mis-ordering of frames that are part of any given conversation, or b) Duplication of frames. The above requirement to maintain frame ordering is met by ensuring that all frames that compose a given conversation are transmitted on a single link in the order that they are generated by the MAC Client; hence, this requirement does not involve the addition (or modification) of any information to the MAC frame, nor any buffering or processing on the part of the corresponding Frame Collector in order to re-order frames.

So, whatever algorithm a switch / NIC driver uses to distribute transmitted frames must adhere to the requirements as stated in that presentation (which, presumably, was quoting from the standard). There is no particular algorithm specified, only a compliant behavior defined.

Even though there's no algorithm specified, we can look at a particular implementation to get a feel for how such an algorithm might work. The Linux kernel "bonding" driver, for example, has an 802.3ad-compliant transmit hash policy that applies the function (see bonding.txt in the Documentation\networking directory of the kernel source):

Destination Port = ((<source IP> XOR <dest IP>) AND 0xFFFF) 
    XOR (<source MAC> XOR <destination MAC>)) MOD <ports in aggregate group>

This causes both the source and destination IP addresses, as well as the source and destination MAC addresses, to influence the port selection.

The destination IP address used in this type of hashing would be the address that's present in the frame. Take a second to think about that. The router's IP address, in an Ethernet frame header away from your server to the Internet, isn't encapsulated anywhere in such a frame. The router's MAC address is present in the header of such a frame, but the router's IP address isn't. The destination IP address encapsulated in the frame's payload will be the address of the Internet client making the request to your server.

A transmit hash policy that takes into account both source and destination IP addresses, assuming you have a widely varied pool of clients, should do pretty well for you. In general, more widely varied source and/or destination IP addresses in the traffic flowing across such an aggregated infrastructure will result in more efficient aggregation when a layer 3-based transmit hash policy is used.

Your diagrams show requests coming directly to the servers from the Internet, but it's worth pointing out what a proxy might do to the situation. If you're proxying client requests to your servers then, as chris speaks about in his answer then you may cause bottlenecks. If that proxy is making the request from its own source IP address, instead of from the Internet client's IP address, you'll have fewer possible "flows" in a strictly layer 3-based transmit hash policy.

A transmit hash policy could also take layer 4 information (TCP / UDP port numbers) into account, too, so long as it kept with the requirements in the 802.3ad standard. Such an algorithm is in the Linux kernel, as you reference in your question. Beware that the the documentation for that algorithm warns that, due to fragmentation, traffic may not necessarily flow along the same path and, as such, the algorithm isn't strictly 802.3ad-compliant.

Linux bonding: 802.3ad (LACP) vs. balance-alb mode

I'm not terribly familiar with Juniper switches, but you shouldn't have to configure LACP on them; that is the point of LACP. If this isn't the case, something is wrong with your switch configuration.

LACP only specifies a protocol for dynamically aggregating ports. It does not specify a port scheduling policy (where traffic is sent and received). This policy is set separately. I don't remember the process in Linux, but I know Linux supports specifying at couple different policies, probably similar to balance-alb.

The balance-alb has specific disadvantages. Mainly that it semi-intelligently selects an outgoing port for new connections, and they're stuck to that one port for the life of the connection (it's actually done by MAC, not port, if a port fails the MAC gets assigned to another port, thus allowing the connection to continue).

This doesn't exactly "aggregate" the ports however, as connections will not be able to utilize more than one port. So if you've got 2 1GbE ports, a single connection is still limited to 1GbE. LACP resolves this usually, though it depends on your scheduling policy and the number of active ports supported at each end.

Best Answer

Related Solutions

Networking – How Does Layer 3 LACP Destination Address Hashing Work?

Linux bonding: 802.3ad (LACP) vs. balance-alb mode

Related Topic