Linux – Bonding linux – How to select mac address using mode 2 (balance-xor)

bondinglinuxUbuntu

A few days ago, I wrote a question here, but that question was ambiguous.
So, I will try rewrite the question, explaining all details.

My previous message (closed message): http://goo.gl/aJqQ2

What are you trying to do?

I'm trying to understand how to select the properly mac address, when I use bonding, in the mode 2 (Balance XOR – default option layer2).

I'm working with the driver bonding. I'm trying to understand the operation of all modes (Mode 0, 1, 2, 3, 4, 5 and 6). I understand how all modes work, except the mode 2(balance-xor) and 4(802.3ad). Because the option xmit_hash_policy is unclear!

My main question is: How can I select an active slave or other to send traffic from a peer to other peer?

Now, I will describe my doubt with all details:

At my private lab, I have 2 computers (PC1 and PC2, running linux/ubuntu) with 4 NIC's. On each computer, I have installed two NIC's (Network cards).


Mac addresses on PC1(bond0):

MAC1(eth1): 62:25:BC:06:4F:A6
MAC2(eth2): 62:25:BC:06:59:E6

Mac addresses on PC2(bond0):

MAC4(eth1):62:25:BC:06:5A:1B
MAC3(eth2):62:25:BC:06:59:E9

So, when the driver bonding is loaded, I can see 3 interfaces (bond0, eth1 and eth2) on each PC (eth1 and eth2 are slaves). But, my problem starts here, due to I can't understand how the kernel selects an interface or other.

Sometimes , I notice that all traffic is placed on the same slave (for example, from PC1 to PC2, all traffic will be placed on eth2 of PC2 {MAC = 62:25:BC:06:59:E9} from eth1 of PC1{MAC = 62:25:BC:06:4F:A6}). Thus, following the example above, if I disable eth1 on PC2, traffic will being sent, even though there is an interface disabled.

It does not matter whether the interface eth1 on PC2 is up or down.

This behavior is expected. But, what policy must I follow to select the properly MAC Address (eth1 or eth2)? Why eth2 on pc2 was selected? And why eth1 on pc2 was not selected? What I'm trying to say is: How can I know which interface I use to send traffic from PC1 to PC2 ?

I have a formula that is:

(source MAC XOR destination MAC) modulo slave count

(This formula was extracted from bonding.txt – see the quotation below, at the end)

And I know the hash function (I'd like to thank the user @Mark Wagner)

/* * Hash for the output device based upon layer 2 data */ static
int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count) {
struct ethhdr *data = (struct ethhdr *)skb->data;

    if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto))
            return (data->h_dest[5] ^ data->h_source[5]) % count;

    return 0; }

According to the above example (I send traffic from PC1 to PC2 using eth1 on PC1 and eth2 on PC2). Thus, my MAC addresses are:

eth1 : 62:25:BC:06:4F:A6  (PC1)
eth2 : 62:25:BC:06:59:E9  (PC2)

So, How can I determine which mac address I must take on PC2? Why it took eth2 and not eth1?

The user @Mark Wagner has tried to help me, and he wrote the following explanation:


The actual hash function is:

/*
* Hash for the output device based upon layer 2 data
*/
static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
{
struct ethhdr *data = (struct ethhdr *)skb->data;

    if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto))
            return (data->h_dest[5] ^ data->h_source[5]) % count;

    return 0;

}

Where h_dest and h_source are the MAC addresses. Assuming the defaults, the MAC addr of your bonds are PC1: 62:25:BC:06:4F:A6 and PC2: 62:25:BC:06:5A:1B. count = 2. Thus the hash function returns:

0xA6 ^ 0x5A % 2 = 0


But I don't understand how to calculate the xor function. Can anyone explain me how to calculate that?

Thank you!


****Appendix:

Formula from bonding.txt

layer2

  Uses XOR of hardware MAC addresses to generate the
  hash.  The formula is

  (source MAC XOR destination MAC) modulo slave count

  This algorithm will place all traffic to a particular
  network peer on the same slave.

  This algorithm is 802.3ad compliant.

XOR truth table:
http://www.tomshardware.com/reviews/safer-6-raid-controllers,1199-2.html

Best Answer

Let me try to simplify this for you. When looking at xmit_hash_policy think:

  • layer 2 = MAC
  • layer 3 = IP
  • layer 4 = PORT

Next think "single session for each layer". Example:

  • Source MAC to destination MAC = Single Session = Single Interface
  • Source IP to destination IP = Single Session = Single Interface
  • Source PORT to destination PORT = Single Session = Single Interface

Put another way:

  • Single MAC = Single Interface Used
  • Single IP = Single Interface Used
  • Single PORT = Single Interface Used

Typically, when you communicate between two nodes you have a single MAC and single IP. So you will only ever see a single interface being used.

Say you want to increase the throughput between two servers using 1GbE. Each server is bonded using 4 NICs and a single bonded interface. That bonded interface, say bond0, has a single IP and a single MAC. In this scenario you will max at 120MB/s between the two servers.

Next, you add a sub interface. This is basically a virtual interface that gives you another IP address. This results in two IP addresses on the same bonded interface. In linux you would have, for example, bond0 and bond0:1 depending on how you configured it.

If you are "hashing" at layer 2 then multiple IPs don't get you anything. You are still stuck with a single source MAC and a single destination MAC. However, if you hash at layer 3 the driver will now, more than likely, balance your transmit.

If you have a multithreaded application that is using multiple ports, say TCP ports, then you want to hash at layer 4 which will balance the load even further.

You can illustrate this by using a tool such as netperf. In each scenario you can run netperf using multiple IP addresses or multiple ports and you will see traffic balanced out multiple ports.

Remember, however, this is transmit only. Receive is controlled by the switch. Cisco lets you customize the hashing policy. The lower end switches let you do layers 2 and 3 and the higher end let you do layers 2, 3 and 4.

Scenario:

You have a backup server and you send data to a NAS backup appliance. You use mode 4 with xmit_hash_policy=layer3+4 on the backup server and have 4 1GbE NICs in the bond. Your backup software is configured to send data to the IP of the backup appliance but it does so over multiple TCP ports with multiple streams.

With this configuration data will be sent out all interfaces assuming you have enough streams to be balanced. How does it determine what goes where? I think you have the answer to that but I won't pretend to understand how. I just know that it does from experience.

So lets say that you now have the ability to transmit data at 120MB/s * 4 (120MB/s per 1GbE interface). But now the data hits the switch and the switch has an etherchannel (Aggregation Group) that is configured with a hashing policy at layer 3. (On Cisco that could be src-ip, dst-ip or src-dst-ip). We'll go with src-dst-ip for this example. So now the switch is hashing based on the source and destination IP addresses, which are always the same, and so it will always only choose a single destination port on the switch.

So while you can transmit at 450+ MB/s, the target can only receive at 120MB/s.

If the switch can hash at layer 4 (Cisco would be src-port, dst-port or src-dst-port) then you now have the ability to transfer that data from the backup server to the appliance using all 4 ports. That is assuming that the backup appliance is also bonded.

But what if you don't have an expensive Cisco switch and can't hash at layer 4? You can create additional IP Addresses! Then you configure your backup server to run jobs using 4 different IP addresses and it will balance because the switch will hash based on source and destination IP addresses.

Other switch vendors have their own hashing algorithm which are usually based on a mix of IP and MAC (layers 2 and 3). I have had to create static arp entries in the past for such switches so that there are both multiple IP addresses and multiple MAC addresses.

Hopefully this helps you better understand how xmit_hash_policy works, at least in practice.