Architecture – Scaling websocket client connections (not server) to multiple servers

Architecturedistributed computingscalabilitywebsockets

I wrote a Slack bot which must connect to Slack teams through websocket connections. Since the bot might be used by thousands of team, I will eventually need to distribute the teams across multiple servers. New teams are added through an HTTP server which handles the initial OAuth authentication.

I'm looking for a solutions that will help me achieve the following:

  • When a server goes down or reboots, all the teams it was assigned to must be re-assigned to the remaining servers. It's ok if the connection to the Slack team temporarily goes down as long as the team is quickly picked up by a server.

  • When a team is added, it gets assigned to the less "busy" server. Busy could be simply defined by the amount of team it currently handles.

  • I'd like to do all of this with minimal custom code to write.

So far, I've considered the following solutions:

1) Work queue with RabbitMQ. Bot servers compete to receive teams. That's an OK solution though I need a reliable way to put back teams in the queue when a server goes down.

2) Write a custom "orchestration" service. The orchestration service would receive teams from the http server and dispatch them to a cluster of servers. It would need to keep track of when servers go down and which teams need to be re-assigned. I'm not really sure how to write such a service reliably and this would become a single point of failure.

3) Your suggestions!

Best Answer

Fundamentally, you're looking for advice on how to achieve load balancing. Even with the limitations you've offered, this is a pretty broad topic.

One possible solution:

Starting under the assumption that you have some means for any arbitrary client to obtain a list of currently active servers, you could use some variation on Consistent Hashing or Rendezvous Hashing.

As a simplified example, you map each server to numerous random bucket values between 0 and 1. For a given client, you hash some sort of client id (e.g., the client's IP address) to a number, then send it to the server with a bucket which is closest to the selected server[1].

There are several benefits to this approach:

  • Other than maintaining the list of servers, all of this logic can be done client-side.
  • The logic is incredibly simple; implementing this functionality should only require 10-20 lines of code.
  • Adding or removing servers is handled cleanly. When a server is added, most clients will not switch servers. When a server is lost, only that server's clients will be remapped.

The downside of this approach is that it is random. There is a risk that load will not be distributed evenly, especially if the number of clients is low.

[1] This calculation wraps (hence why most discussions talk about angles or circles), but the impact of ignoring this is pretty small. The simplest fix is to add an extra 1+Min(bucket_value) bucket. Basically, 0.01 should be treated as closer to 0.99 than to 0.5 to avoid weighing the highest and lowest bucket values less unevenly.