Apache Kafka – Is Kafka Suitable for Connecting REST API with Worker for Large Computations?

We currently have a REST API service (let's call it as A) written in Python which triggers various heavy calculation jobs for Worker (W) written in Python as well. Both services are interconnected using a self-written PostgreSQL queue lib.

That's how they communicate at the moment:

 1. Call POST request to service A to create a Job in PG Queue
 2. Worker takes a job from Queue and runs calculations
 3. After a successful calculations the worker saves results in the shared DB 
- 
 1. External service B requests every N seconds service A to take the data calculated in the W by job id
 2. Service A observes shared DB and returns the data once the worker is finished (status=Done for a particular job)

We have decided to rewrite the Python service A into Node.JS (NestJS) and improve the current implementation in any way. As you may have noticed, there's line External service B requests every N seconds. We want to get rid of this pattern as well and use something more efficient and performant for this case.

1. Concept using Kafka

The first idea is to use some message queue service like Kafka:

I like this approach since we have a separated worker which only executes some calculations (jobs) and Kafka which plays the role of distributing messages between services. This approach could also get rid of interval/polling requests for taking an actual data.

At the same time it has several possible drawbacks:

Imagine the Worker runs only on single thread and, accordingly, can perform one job at a time. How to handle multiple messages at the same time from Kafka in the worker if there is already running job? Do we need to create a RAM Queue or Kafka can handle these cases?
What if Kafka or service that consumes messages would suddenly shutdown, will the transmitted data be lost?
Maybe use another appropriate MQ such as RabbitMQ instead of Kafka?

2. Concept using PG Queue … with Kafka

Since we are rewriting the service from scratch using a new technology, we can improve the original approach by replacing the interval requests for getting the data with Kafka. So, the interaction scheme will look like this:

 1. External service B consumes all messages from a particular topic
 2. Service A observes DB and sends data to the topic once the worker is finished (status=Done on the job)

The approach using a shared database and a Postgres queue seems to me more safe and controllable, but at the same time the approach using Kafka as a common message broker seems more modern and sustainable.

Any thoughts on this? Maybe there's better approach to this problem.

Best Answer

You have correctly identified a big inefficiency:

External service B requests every N seconds

Any kind of long polling or continuous polling is bound to be inefficient if the normal duration of the job is unknown.

According to your diagram, once the Worker is finished calculating the result, the Worker can do two things:

Persist the results of the work somewhere (a database, an s3 bucket, even Kafka itself with a long retention policy).
Notify others that the work is done (Kafka, AWS SNS, etc).

This will be a much more efficient design that is also not too difficult to implement. You should feel confident to proceed down this path, as long as you are comfortable with learning the lower level details of the Kafka SDK in your language.

1. Concept using Kafka

2. Concept using PG Queue … with Kafka

Best Answer

Related Solutions

Single Docker Container for Two Processes vs Two Services Connecting to Same DB

RabbitMQ DLQ Messages – Save to Database for Analysis

Related Topic