Design – How to fit a rules engine in a microservice architecture when it requires lots of input data

Architecturebusiness-rulesdesignenterprise-architecturemicroservices

Current situation

We are implementing (and now maintaining) an online shopping web application in a microservice architecture.

One of the requirements is that the business must be able to apply rules on what our customers add to their cart, in order to customize their experience and the eventual order. Quite obviously, a business rules engine had to be put in place, and we implemented a specific "microservice" for this (if we could still call it so).

Over the course of a year, this rules engine has become more and more complex, requiring more and more data (e.g. content of the cart but also user information, his role, his existing services, some billing information etc.) to be able to compute those rules.

For the moment, our shopping-cart microservice is gathering all this data from other microservices. Even though part of this data is used by shopping-cart, most of the time it is mainly used to feed the rules engine.

New requirements

Now arrives the need for other applications/microservices to reuse the rules engine for similar requirements. In the current situation, they would thus have to transmit the same kind of data, call the same microservices and build (almost) the same resources to be able to call the rules engine.

Continuing as is, we will face several issues:

  • everyone (calling the rules engine) has to reimplement the fetching of the data, even if they don't need it for themselves;
  • the requests to the rules engine are complex;
  • continuing in this direction, we will have to transport this data all around the network for many requests (think of μs A calling μs B calling the rules engine, but A already has some of the data the rules engine needs);
  • shopping-cart has become huge due to all the data fetching;
  • I probably forget many…

What can we do to avoid these troubles?

Ideally, we would avoid adding more complexity to the rules engine. We must also make sure that it does not become a bottleneck – for example some data is rather slow to fetch (10s or even more) so we implemented pre-fetching in shopping-cart such that the data is more likely to be there before we call the rules engine, and keep an acceptable user experience.

Some ideas

  1. Let the rules engine fetch the data it needs. This would add even more complexity to it, violating the single responsibility principle (even more…);
  2. Implement a proxy μs before the rules engine to fetch the data;
  3. Implement a "data fetcher" μs that the rules engine calls to fetch all the data it needs at once (composite inquiry).

Best Answer

Let's take a step back for a second and assess our starting place before writing out this likely-to-be-novel-length answer. You have:

  • A large monolith (the rules engine)
  • A large quantity of non-modularized data that gets sent around in bulk
  • It's hard to get data to and from the rules engine
  • You can't remove the rules engine

Ok, this is not that great for microservices. An immediately glaring problem is you guys seem to be misunderstanding what microservices are.

everyone (calling the rules engine) has to reimplement the fetching of the data, even if they don't need it for themselves;

You need to define some sort of API or communication method that your microservices use and have it be common. This might be a library all of them can import. It might be defining a message protocol. It might be using an existing tool (look for microservice message buses as a good starting place).

The question of interservice communication is not a "solved" problem per se, but it's also not a "roll your own" problem at this point. A lot of existing tooling and strategies can make your life a ton easier.

Regardless of what you do, pick a single system and try to adapt your communication APIs to use this. Without some a defined way for your services to interact you are going to have all of the disadvantages of microservices and monolithic services and none of the advantages of either.

Most of your issues stem from this.

the requests to the rules engine are complex;

Make them less complex.

Find ways to make them less complex. Seriously. Common datamodels, split up your single rules engine into smaller ones, or something. Make your rules engine work better. Don't take the "jam everything into the query and just keep making them complicated" approach -- seriously look at what you are doing and why.

Define some sort of protocol for your data. My guess is you guys have no defined API plan (as per the above) and have started writing REST calls ad hoc whenever needed. This gets increasingly complex as you now have to maintain every microservice every time something gets updated.

Better yet, you aren't exactly the first company to ever implement an online shopping tool. Go research other companies.

Now what...

After this, you at least triaged some of the biggest issues.

The next issue is this question of your rules engine. I hope that this is reasonably stateless, such that you can scale it. If it is, while suboptimal you at least aren't going to die in a blaze of glory or build insane workarounds.

You want your rules engine to be stateless. Make it such that it processes data only. If you find it as a bottleneck, make it so you can run several behind a proxy/load balancer. Not ideal, but still workable.

Spend some time considering whether any of your microservices really should be put into your rules engine. If you are increasing your system overhead so significantly just to achieve a "microservices architecture" you need to spend more time planning this out.

Alternatively, can your rules engine be split into pieces? You may get gains just be making pieces of your rules engine specific services.

We must also make sure that it does not become a bottleneck – for example some data is rather slow to fetch (10s or even more)

Assuming this problem exists after solving the above issues you need to seriously investigate why this is happening. You have a nightmare unfolding but instead of figuring out why (10 seconds? for sending shopping portal data around? Call me cynical, but this seems a bit absurd) you seem to be patching the symptoms rather than looking at the problem causing the symptoms in the first place.

You've used the phrase "data fetching" over and over. Is this data in a database? If not, consider doing this - if you are spending so much time "manually" fetching data it seems like using a real database would be a good idea.

You may be able to have a design with a database for the data you fetch (depending on what this is, you've mentioned it many times), a few rules engines, and your client(s).

One last note is you want to make sure you use proper versioning of your APIs and services. A minor release should not break backwards compatibility. If you find yourself releasing all your services at the same time for them to work, you don't have a microservice architecture, you have a distributed monolithic architecture.

And ultimately, microservices aren't a one-size-fits-all solution. Please, for the sake of all that is holy, don't just do it because it's the new hip thing.

Related Topic