Handling large amounts of sockets

real timesockets

I am working on a project in which we have a desktop application that should be able to receive commands from a web application. To solve this issue, using sockets seems like a good approach (instead of long polling and various other techniques).

The desktop application also uses a web API, and this is secured with mutual authentication in SSL. The sockets should also be secured with SSL.

The question is: what is the best way to handle large amounts of sockets?

I have thought of a solution which involves using a "relay socket". Basically the desktop application creates a socket to [The Relay Station]. Whenever an action should be sent from the web application, the web app also creates a socket to the relay station.

The web app sends a command saying "relay [this] to [socket]".

This approach requires that:

  • All the desktop-to-relay sockets must be persistent
  • The desktop application must identify himself, so that the webapp can reference him in his commands
  • The relay station needs to keep track of all the desktop application sockets, and map those to a unique identifier.

This gives me a good structure, but it also arises some questions:

  • What should the identifier be?
  • Is there a better way of sending commands from [WebApp] to the correct desktop application, without using such approach?

Best Answer

The question is whether you'd rather the linking application run on your web server (as a relay server) or the desktop (your app accepts lots of connections). That's a design decision, I think, rather than a programming one.

Somewhere you need an application that will accept large numbers of connections. One per instance of the web app, presumably. A million is fairly straightforward and doesn't have to be done in C (example uses erlang and duplicates the code using C).

How many commands are you looking at, total, and how many instances of the web app (tens? thousands? millions?) If it's millions, it doesn't really matter whether you funnel those down using a relay, you're going to spend a lot of time dealing with replication and multiple instances of your endpoints. A million instances sending one command a minute is 16k commands a second.

I think the identifier question is pretty obvious - use the native identifier that comes with the connection (the socket handle or class reference), then whatever mapping system your language provides to link that to the ID of the web app or command. If you must construct your own ID and don't care about the identity of the command source, just use the IP details directly, otherwise use the user or command ID that you have. It's much easier when debugging to see

user 123456 issued command Fubar(everything) from 192.168.0.255:54012

than have an artificial ID in there (or have to look it up).

I suggest the ID be the source address and port, plus the input port (your IP will be fixed, unless you have multiple IPs in which case store that too). With IPv4 that's 32 bits per IP plus 16 per port, for a maximum of 96. With IPv6 it's 160 bits. Either way, that's not going to be your biggest memory suck. Once you put that into a list of some sort you're going to have a pointer or a native int, which becomes your "real ID". Ideally you'll be able to use a direct link into the network layer in your software for that, to save a layer of indirection.