Network link saturated. How to fix

bondingnetworking

I have a backup server installed in rack A. That rack has a Cisco WS-C3560G-48TS-S switch installed.

I also have 10 proxmox servers (=linux vm host) in rack B. That rack has a HP 2810-48 switch installed.

There is currently only 1 CAT6 cable between the switches. It's a trunk for all vlans.

These 10 proxmox servers are doing their backups all at the same time (default config) and offcourse the link between two switches is saturated during backups.

So I need to fix that. But I'm not sure what's the best way?

My idea was to add a quad-port nic to the backupserver, setup LACP between the backup-server and the cisco switches (over 6 ports). Then also do LACP between the cisco switch and the HP switch using 6 ports.
And a 7th cable for the other vlans.

But when I said that on a networking IRC channel some guy said that this won't work as expected because of switches and their buffers. I couldn't completely understand what he was trying to say and then he disappeared..

I don't see a problem in my idea, but I'm not a network professional so I'm confused now…

The guy said that the only good way to fix this is by connecting all servers to the same switch or use switches with a 10G port and connect them together using a 10G port.

Can anyone please explain to me if my idea is bad and why?

Best Answer

Your idea is not bad.

I have actually similar setups as yours for my backup servers. But not that much for the extra bandwidth rather the extra redundancy.

I don't know what that guy on IRC was trying to say but I know that LACP won't work as expected in terms of maximum bandwidth you can reach per server.

LACP does not bond the ports together to give a combined bandwidth for single connections for example. That's the main misunderstanding about LACP and Bonding (or Etherchannel in terms of Cisco)

So if you have 6 gbit ports bonded into an etherchannel between your 2 switches and you want to do data transfer from one server on one switch to another on the order switch it will utilize only one of those 6 gbit ports (assuming for the sake of argument that the servers on both sides have more than 1gbit available).

Now if another server from one switch want's at the same time to do some data transfer on another server on the other switch depending on your LACP configuration it will use another 1gbit port for that transfer.

So in total per server/port/whatever you will still get a maximum of 1gbit, but when you are doing parallel data transfers LACP will utilize/load balance the 6 available gbit ports resulting eventually in a total/combined bandwidth of 6 gbits.

So, in theory what you propose should work OK when running multiple backups at the same time.

But, from my personal experience all my backup servers' bottleneck is the storage rather than the network (I use 2 gbit ports instead of 4 you are suggesting).
Mechanical disks can be way slower than 1gbit when randomly accessed. Of course it has to do with your backup system and policy so that may not be an issue for you.