Do I actually need a message broker or are websockets enough

client-serverflaskmessage-queuesocket.iowebsockets

The website I am building has a real-time messaging component. The backend is built with Flask and I have integrated Flask-SocketIO to handle Websocket connections when users are on the messaging page.

My current infrastructure is pretty simple. Flask handles both HTTP requests and Websocket events. When users land on the Messages page, the client sends a GET /message_history request and then opens a socket connection.

When Alice sends a message to Bob, the server handles the event, checks that Alice is logged in and is allowed to talk to Bob, then stores the message in the database and finally emits the message back to both Alice and Bob, whose clients are listening for events from the server.

This works fine in development but for production I need to make changes.

Primarily, this won't work under a load-balancer. As per the Flask-SocketIO docs:

There are two requirements to use multiple Flask-SocketIO workers:

The load balancer must be configured to forward all HTTP requests from
a given client always to the same worker. This is sometimes referenced
as “sticky sessions”.

Since each of the servers owns only a subset of the client
connections, a message queue such as Redis or RabbitMQ is used by the
servers to coordinate complex operations such as broadcasting and
rooms.

However, I don't quite understand how to implement Redis or RabbitMQ as part of my current event handlers, since the clients will not be connecting to Redis/RabbitMQ directly. The Flask server will always need to be the middleman since it needs to perform checks and store the messages in the database.

Take a look at this answer on SO:

Redis pub/sub is great in case all clients have direct access to
redis. If you have multiple node servers, one can push a message to
the others.

But if you also have clients in the browser, you need something else
to push data from a server to a client, and in this case, socket.io is
great.

Now, if you use socket.io with the Redis store, socket.io will use
Redis pub/sub under the hood to propagate messages between servers,
and servers will propagate messages to clients.

EDIT

I followed the docs: I have Redis running locally, I have installed the dependecies and monkey-patched eventlet. I ran everything and it works… ? I see that Redis is receiving "SUBSCRIBE" and "PUBLISH" events…

Do I still need my PSQL database? Can I use the Redis store to fetch historic messages? What happens if the Redis instance dies?
If I run this under a load-balancer, and have configured sticky sessions, does Redis (or Flask?) automagically know how to distribute the messages to the correct nodes?

Best Answer

At a high-level, in your design, the PostgreSQL database is standing in for the Redis cache and/or messaging system.

This can be made to work but you could run into scalability issues. When you use a database as a messaging system, managing the state of messages becomes problematic. For example, when you create a new message to be sent, you want to make sure that only one handler picks up that message and delivers it. This coordination is not something that DBs tend to provide by default. There are various schemes you can use but if you use a messaging platform it's an inherent feature.

Related Solutions

Is Having a Bi-Directional Message Queue a Design Smell?

Communications do not usually only go one way. Parties usually communicate using request-response pairs, which are clearly not one way. I presume that you already know this, so what you are probably thinking when you say "one-way" is not how information flows, but who initiates the requests.

So, yes, the way we usually do things is that only one of the parties initiates requests. Which means that the other party always sends responses.

But there is an exception: the "callback", otherwise known as the "notifier-observer" pattern. It is perfectly fine for one side to send a "register observer" request to which the other side will immediately respond with an "observer registered" response and then, in the future, with one or more "event observation" responses.

Your terminology with "front-end servers" and "back-end servers" and other unrelated entities you call "clients" only confuses things, and you should not have brought it into the discussion. Things are very simple: Within the context of a communication session between two parties, the requesting party is always called a client, and the responding party is always called a server. In your case, your server is what you call a "back-end server", and your client is what you call a "front-end server".

EDIT:

Of course, in order to accomplish two-way communication you need to employ two queues. I have worked on one large-scale system which employed two queues per pair of modules that communicated, (one queue for each "way" in the "two-way",) so trust me, it is not something bizarre.

Design pattern for socket.io and Express

Although you asked this a year ago, I have a solution! One you may have use for if you are still using socket.io

Hey I came across this precise issue while creating a haiku sharing application (you can check it out at http://haiku.run )

You can see my source code hosted on github at http://github.com/sova/haiku.run

Check out the code for index.js https://github.com/sova/haiku.run/blob/master/src/index.js#L95

on Line 95 there is the following code:

clients[clients.indexOf(socket)].emit('load_haiku_from_cache', a_latest_haiku);

Basically, when a new user connects, I want to load the cache of haiku that are stored in RAM on the node app in a variable called haiku_cache or something like that. There's 7 of them in the cache, and they get emitted when a user freshly connects.

However, it's different because now if a new user, say user 2 connects, then user 1 would (normally) get a double update of the cache in their socket-spurned-feed of stuff.

So, the above code effectively emits information along one particular socket.

In conclusion, I recommend you keep track of all the sockets that you wish to receive a particular set of informations, and then use the above-like line to send to specific sockets. You can emit across the wire / over the socket to one specific connected client, which is really nice, using the above line.

Hope that helps anybody looking around for a good solution to "how to emit to a subset of connected peers" since the group/room socket.io stuff is kinda messy imho.

EDIT

Best Answer

Related Solutions

Is Having a Bi-Directional Message Queue a Design Smell?

Design pattern for socket.io and Express

Related Topic