Single Docker Container for Two Processes vs Two Services Connecting to Same DB

apache-kafkadockerflaskmicroservicespython

I recently started moving a monolithic application to microservices architecture using docker containers.
The general idea of the app is:

scraping data -> format the data -> save the data to MySQL -> serve data via REST API.

I want to split each of the steps into a separate service. I think I have two choices, what is the best practice in microservices architecture here?

Option one
Scraper service – scrapes and publishes to Kafka
Formatter service – consumes messages from Kafka and formats it
API service – consumes Kafka messages, updates MySQL and exposes a REST API
Drawback: If I'm not wrong, docker containers should preferably run only one process per container

Option two
Scraper service – scrapes and publishes to Kafka
Formatter service – consumes messages from Kafka and formats it
Saving to DB service – receives the formatted information and just updates MySQL (runs as python process)
API service – exposes a REST API that serves requests with python flask.
Drawback: Two services connecting to the same DB, supposely not recommended as they would not be decoupled

What is the best practice here? should I go with option one and run flask server and kafka listener in the same container?

Thanks!

Best Answer

I would suggest something along the following lines.

  • Scraper: scrapes the data and published to Kafka
  • Formatter/Persistence: Reads from Kafka, sends data to the storage layer
  • Storage: 1 "real" database where you performs writes. Replicate this db to as many read only copies as you need.
  • API: Accesses only the read-only replicas to serve the data.

The concept of eventual consistency comes into play here. You can spin up as many replicas and API containers as you need to meet demand, at the cost of them sometimes returning different (old) data. At some point the replica dbs get refreshed and the API starts serving the newest data. This way, writing new data doesn't bottleneck the response times of your reads.

Related Topic