Rest – Why is REST commonly used instead of RPC-like mechanisms in web applications

restrpcweb-applications

I started very recently at a company that uses a rather unusual custom framework for their web applications, at least compared to the typical web application frameworks I know. Instead of a RESTful webservice an RPC mechanism is used to communicate with the server.

Communicating with the server looks like a simple function call, but the function is executed on the server, not the client. On the server-side there is a way to define which functions the client can call. The details of how this is translated into http requests is abstracted away completely.

I've only used this a short time now, but it seems pretty convenient. But I'm wondering what drawbacks of this approach I'm missing. Everyone else seems to be doing it differently, which usually is a sign for me that I might be doing something stupid or brilliant, with much higher odds on the former.

Best Answer

REST was designed for the web, and the web was designed for REST. The two just fit will together. Roy Fielding's 2000 PhD thesis Architectural Styles and the Design of Network-based Software Architectures defined and introduced the term REST, and there is significant interplay between the web and REST: Roy Fielding worked on HTTP/1.1, of which he is the primary author, and he used what he learned there to describe REST in his dissertation.

So, the simple reason why the web and REST go so well together is that the definition of REST was extracted from how the web works, and the web is an implementation of REST.

That's why REST is a good fit for web services and web apps: because you simply do the same things that have already been proven to work in the "human" web, and apply them to the "machine" web.

The big problem with RPC (depending on the exact implementation) lies basically in the Fallacies of Distributed Computing, which are explained in more detail in this whitepaper by Arnon Rotem-Gal-Oz:

  1. The network is reliable
  2. Latency is zero
  3. Bandwidth is infinite
  4. The network is secure
  5. Topology doesn't change
  6. There is one administrator
  7. Transport cost is zero
  8. The network is homogeneous

These are all assumptions that newcomers typically make when they start to create distributed systems. Of course, all of them are false. And you need to take all of them into account when creating distributed systems.

The problem with many RPC implementations is that they try to make remote calls look like local calls. But they are nothing alike:

  • a local call never fails; the subroutine that you called may fail, but the call itself never does – a remote call may get lost on the network
  • a local call is instantaneous; the subroutine that you called may run for a long time (or even forever if it gets stuck in an infinite loop), but the call itself takes no time at all (well, a handful of CPU instructions at most, less if the call is inlined, but it's very fast)  – a remote call may get stuck on the network for a long time
  • if the subroutine returns normally, the result always comes back – with a remote call, the result may get lost on the network
  • returns are instantaneous – remote results can travel on the network for a long time
  • if I call a subroutine once, it will run exactly once – a remote call may get lost on the network, or duplicated so the remote routine may run between 0 and any number of times
  • I get back exactly one result – a remote result may get lost or duplicated, so you may get the result 0 or more times
  • if I call a subroutine twice, I get two results and I get the result of the first call before the result of the second call – you can probably guess it by now: with RPC, you may get back no results, or only the first, or only the second, or the second before the first, or the first may be lost and you get the second twice, or the other way around, and so on …
  • if I call a and then b, I get back the result of a and then the result of b – this is just a more general version of the previous point, with RPC, you may get any of the two answers 0 or more times in any order

You will have to deal with all of the above for a remote call. But if your framework makes remote calls indistinguishable from local calls, then you can't, because you don't know which ones are the remote calls. The framework may try and handle all of those for you, but the problem is: the framework doesn't know as much about your system as you do. It doesn't know whether there are calls where it actually doesn't matter if one gets lost once in a while. So, the framework has to be very defensive, and that is expensive in terms of latency and bandwidth.

Especially since the framework actually cannot shield you. The CAP Theorem says that a distributed system cannot be Consistent, Available, and Partition-Tolerant at the same time; more precisely, it says that once a Partition occurs, the system cannot continue to be both Consistent and Available, it has to choose one (contrary to popular belief, the theorem does not say that you cannot have all three, when the system is running normally, you can have all three; but once you have a Partition, you have to choose one of the other two). The PACELC Theorem extends the CAP Theorem by showing that even when the system is working, you have to trade-off Latency vs. Consistency.

These are important trade-offs that the framework pretty much cannot shield you from, since they are domain-specific and important to the core design.

Contrast this with an approach like Erlang's, which does work: in Erlang, all message sends are treated as remote, even if they are local. This means you always are prepared to deal with all of the above problems (and many more). For local processes, these do pose a little bit of an overhead, though. In order to help with this, there are great deal of tools, frameworks, libraries, patterns, and idioms for dealing with error handling and supervision.

You haven't described how your RPC framework works in particular, and what language or libraries you are using, but I have a strong suspicion that it belongs to the former "pretend the network doesn't exist" type. Those just don't work. It is okay to remove the distinction between local and remote calls by treating everything as a remote call. Doing it the other way around abstracts too much: the network is part of your system, if you abstract it away, you abstract away something that you actually need to know about.

Now, whether you have to specifically use REST or not, that's an entirely different question. As I explained above, the web was designed for REST and REST was designed for the web, so the two do make sense together, but you can use other architectural styles, if you want to. But at least part of your question was about "why not RPC", and I laid out the reasons above, more precisely I explained why the type of RPC I suspect you are using may land you in trouble.

Related Topic