Repeat after me:
REST and asynchronous events are not alternatives. They're completely orthogonal.
You can have one, or the other, or both, or neither. They're entirely different tools for entirely different problem domains. In fact, general purpose request-response communication is absolutely capable of being asynchronous, event-driven, and fault tolerant.
As a trivial example, the AMQP protocol sends messages over a TCP connection. In TCP, every packet must be acknowledged by the receiver. If a sender of a packet doesn't receive an ACK for that packet, it keeps resending that packet until it's ACK'd or until the application layer "gives up" and abandons the connection. This is clearly a non-fault-tolerant request-response model because every "packet send request" must have an accompanying "packet acknowledge response", and failure to respond results in the entire connection failing. Yet AMQP, a standardized and widely adopted protocol for asynchronous fault tolerant messaging, is communicated over TCP! What gives?
The core concept at play here is that scalable loosely-coupled fault-tolerant messaging is defined by what messages you send, not how you send them. In other words, loose coupling is defined at the application layer.
Let's look at two parties communicating either directly with RESTful HTTP or indirectly with an AMQP message broker. Suppose Party A wishes to upload a JPEG image to Party B who will sharpen, compress, or otherwise enhance the image. Party A doesn't need the processed image immediately, but does require a reference to it for future use and retrieval. Here's one way that might go in REST:
- Party A sends an HTTP
POST
request message to Party B with Content-Type: image/jpeg
- Party B processes the image (for a long time if it's large) while Party A waits, possibly doing other things
- Party B sends an HTTP
201 Created
response message to Party A with a Content-Location: <url>
header which links to the processed image
- Party A considers its work done since it now has a reference to the processed image
- Sometime in the future when Party A needs the processed image, it
GETs it using the link from the earlier
Content-Location
header
The 201 Created
response code tells a client that not only was their request successful, it also created a new resource. In a 201 response, the Content-Location
header is a link to the created resource. This is specified in RFC 7231 Sections 6.3.2 and 3.1.4.2.
Now, lets see how this interaction works over a hypothetical RPC protocol on top of AMQP:
- Party A sends an AMQP message broker (call it Messenger) a message containing the image and instructions to route it to Party B for processing, then respond to Party A with an address of some sort for the image
- Party A waits, possibly doing other things
- Messenger sends Party A's original message to Party B
- Party B processes the message
- Party B sends Messenger a message containing an address for the processed image and instructions to route that message to Party A
- Messenger sends Party A the message from Party B containing the processed image address
- Party A considers its work done since it now has a reference to the processed image
- Sometime in the future when Party A needs the image, it retrieves the image using the address (possibly by sending messages to some other party)
Do you see the problem here? In both cases, Party A can't get an image address until after Party B processes the image. Yet Party A doesn't need the image right away and, by all rights, couldn't care less if processing is finished yet!
We can fix this pretty easily in the AMQP case by having Party B tell A that B accepted the image for processing, giving A an address for where the image will be after processing completes. Then Party B can send A a message sometime in the future indicating the image processing is finished. AMQP messaging to the rescue!
Except guess what: you can achieve the same thing with REST. In the AMQP example we changed a "here's the processed image" message to a "the image is processing, you can get it later" message. To do that in RESTful HTTP, we'll use the 202 Accepted
code and Content-Location
again:
- Party A sends an HTTP
POST
message to Party B with Content-Type: image/jpeg
- Party B immediately sends back a
202 Accepted
response which contains some sort of "asynchronous operation" content which describes whether processing is finished and where the image will be available when it's done processing. Also included is a Content-Location: <link>
header which, in a 202 Accepted
response, is a link to the resource represented by whatever the response body is. In this case, that means it's a link to our asynchronous operation!
- Party A considers its work done since it now has a reference to the processed image
- Sometime in the future when Party A needs the processed image, it first GETs the async operation resource linked to in the
Content-Location
header to determine if processing is finished. If so, Party A then uses the link in the async operation itself to GET the processed image.
The only difference here is that in the AMQP model, Party B tells Party A when the image processing is done. But in the REST model, Party A checks if processing is done just before it actually needs the image. These approaches are equivalently scalable. As the system gets larger, the number of messages sent in both the async AMQP and the async REST strategies increase with equivalent asymptotic complexity. The only difference is the client is sending an extra message instead of the server.
But the REST approach has a few more tricks up its sleeve: dynamic discovery and protocol negotiation. Consider how both the sync and async REST interactions started. Party A sent the exact same request to Party B, with the only difference being the particular kind of success message that Party B responded with. What if Party A wanted to choose whether image processing was synchronous or asynchronous? What if Party A doesn't know if Party B is even capable of async processing?
Well, HTTP actually has a standardized protocol for this already! It's called HTTP Preferences, specifically the respond-async
preference of RFC 7240 Section 4.1. If Party A desires an asynchronous response, it includes a Prefer: respond-async
header with its initial POST request. If Party B decides to honor this request, it sends back a 202 Accepted
response that includes a Preference-Applied: respond-async
. Otherwise, Party B simply ignores the Prefer
header and sends back 201 Created
as it normally would.
This allows Party A to negotiate with the server, dynamically adapting to whatever image processing implementation it happens to be talking to. Furthermore, the use of explicit links means Party A doesn't have to know about any parties other than B: no AMQP message broker, no mysterious Party C that knows how to actually turn the image address into image data, no second B-Async party if both synchronous and asynchronous requests need to be made, etc. It simply describes what it needs, what it would optionally like, and then reacts to status codes, response content, and links. Add in Cache-Control
headers for explicit instructions on when to keep local copies of data, and now servers can negotiate with clients which resources clients may keep local (or even offline!) copies of. This is how you build loosely-coupled fault-tolerant microservices in REST.
Let's take a step back for a second and assess our starting place before writing out this likely-to-be-novel-length answer. You have:
- A large monolith (the rules engine)
- A large quantity of non-modularized data that gets sent around in bulk
- It's hard to get data to and from the rules engine
- You can't remove the rules engine
Ok, this is not that great for microservices. An immediately glaring problem is you guys seem to be misunderstanding what microservices are.
everyone (calling the rules engine) has to reimplement the fetching of the data, even if they don't need it for themselves;
You need to define some sort of API or communication method that your microservices use and have it be common. This might be a library all of them can import. It might be defining a message protocol. It might be using an existing tool (look for microservice message buses as a good starting place).
The question of interservice communication is not a "solved" problem per se, but it's also not a "roll your own" problem at this point. A lot of existing tooling and strategies can make your life a ton easier.
Regardless of what you do, pick a single system and try to adapt your communication APIs to use this. Without some a defined way for your services to interact you are going to have all of the disadvantages of microservices and monolithic services and none of the advantages of either.
Most of your issues stem from this.
the requests to the rules engine are complex;
Make them less complex.
Find ways to make them less complex. Seriously. Common datamodels, split up your single rules engine into smaller ones, or something. Make your rules engine work better. Don't take the "jam everything into the query and just keep making them complicated" approach -- seriously look at what you are doing and why.
Define some sort of protocol for your data. My guess is you guys have no defined API plan (as per the above) and have started writing REST calls ad hoc whenever needed. This gets increasingly complex as you now have to maintain every microservice every time something gets updated.
Better yet, you aren't exactly the first company to ever implement an online shopping tool. Go research other companies.
Now what...
After this, you at least triaged some of the biggest issues.
The next issue is this question of your rules engine. I hope that this is reasonably stateless, such that you can scale it. If it is, while suboptimal you at least aren't going to die in a blaze of glory or build insane workarounds.
You want your rules engine to be stateless. Make it such that it processes data only. If you find it as a bottleneck, make it so you can run several behind a proxy/load balancer. Not ideal, but still workable.
Spend some time considering whether any of your microservices really should be put into your rules engine. If you are increasing your system overhead so significantly just to achieve a "microservices architecture" you need to spend more time planning this out.
Alternatively, can your rules engine be split into pieces? You may get gains just be making pieces of your rules engine specific services.
We must also make sure that it does not become a bottleneck – for example some data is rather slow to fetch (10s or even more)
Assuming this problem exists after solving the above issues you need to seriously investigate why this is happening. You have a nightmare unfolding but instead of figuring out why (10 seconds? for sending shopping portal data around? Call me cynical, but this seems a bit absurd) you seem to be patching the symptoms rather than looking at the problem causing the symptoms in the first place.
You've used the phrase "data fetching" over and over. Is this data in a database? If not, consider doing this - if you are spending so much time "manually" fetching data it seems like using a real database would be a good idea.
You may be able to have a design with a database for the data you fetch (depending on what this is, you've mentioned it many times), a few rules engines, and your client(s).
One last note is you want to make sure you use proper versioning of your APIs and services. A minor release should not break backwards compatibility. If you find yourself releasing all your services at the same time for them to work, you don't have a microservice architecture, you have a distributed monolithic architecture.
And ultimately, microservices aren't a one-size-fits-all solution. Please, for the sake of all that is holy, don't just do it because it's the new hip thing.
Best Answer
Internal networks often use 1 Gbps connections, or faster. Optical fiber connections or bonding allow much higher bandwidths between the servers. Now imagine the average size of a JSON response from an API. How much of such responses can be transmitted over a 1 Gbps connection in one second?
Let's actually do the math. 1 Gbps is 131 072 KB per second. If an average JSON response is 5 KB (which is quite a lot!), you can send 26 214 responses per second through the wire with just with one pair of machines. Not so bad, isn't it?
This is why network connection is usually not the bottleneck.
Another aspect of microservices is that you can scale easily. Imagine two servers, one hosting the API, another one consuming it. If ever the connection becomes the bottleneck, just add two other servers and you can double the performance.
This is when our earlier 26 214 responses per second becomes too small for the scale of the app. You add other nine pairs, and you are now able to serve 262 140 responses.
But let's get back to our pair of servers and do some comparisons.
If an average non-cached query to a database takes 10 ms., you're limited to 100 queries per second. 100 queries. 26 214 responses. Achieving the speed of 26 214 responses per second requires a great amount of caching and optimization (if the response actually needs to do something useful, like querying a database; "Hello World"-style responses don't qualify).
On my computer, right now, DOMContentLoaded for Google's home page happened 394 ms. after the request was sent. That's less than 3 requests per second. For Programmers.SE home page, it happened 603 ms. after the request was sent. That's not even 2 requests per second. By the way, I have a 100 Mbps internet connection and a fast computer: many users will wait longer.
If the bottleneck is the network speed between the servers, those two sites could literally do thousands of calls to different APIs while serving the page.
Those two cases show that network probably won't be your bottleneck in theory (in practice, you should do the actual benchmarks and profiling to determine the exact location of the bottleneck of your particular system hosted on a particular hardware). The time spent doing the actual work (would it be SQL queries, compression, whatever) and sending the result to the end user is much more important.
Think about databases
Usually, databases are hosted separately from the web application using them. This can raise a concern: what about the connection speed between the server hosting the application and the server hosting the database?
It appears that there are cases where indeed, the connection speed becomes problematic, that is when you store huge amounts of data which don't need to be processed by the database itself and should be available right now (that is large binary files). But such situations are rare: in most cases, the transfer speed is not that big compared to the speed of processing the query itself.
When the transfer speed actually matters is when a company is hosting large data sets on a NAS, and the NAS is accessed by multiple clients at the same time. This is where a SAN can be a solution. This being said, this is not the only solution. Cat 6 cables can support speeds up to 10 Gbps; bonding can also be used to increase the speed without changing the cables or network adapters. Other solutions exist, involving data replication across multiple NAS.
Forget about speed; think about scalability
An important point of a web app is to be able to scale. While the actual performances matter (because nobody wants to pay for more powerful servers), scalability is much more important, because it let you to throw additional hardware when needed.
If you have a not particularly fast app, you'll lose money because you will need more powerful servers.
If you have a fast app which can't scale, you'll lose customers because you won't be able to respond to an increasing demand.
In the same way, virtual machines were a decade ago perceived as a huge performance issue. Indeed, hosting an application on a server vs. hosting it on a virtual machine had an important performance impact. While the gap is much smaller today, it still exists.
Despite this performance loss, virtual environments became very popular because of the flexibility they give.
As with the network speed, you may find that VM is the actual bottleneck and given your actual scale, you will save billions of dollars by hosting your app directly, without the VMs. But this is not what happens for 99.9% of the apps: their bottleneck is somewhere else, and the drawback of a loss of a few microseconds because of the VM is easily compensated by the benefits of hardware abstraction and scalability.