Performance – Benchmarking Asynchronous Code in Node.js

asynchronous-programmingnode.jsperformance

Asynchronous programming seems to be getting quite popular these days. One of the most quoted advantages is performance gain from removing operations that block threads. But I also saw people saying that the advantage is not that great. And that properly configured thread pool can have same effect as fully asynchronous code.

My question is, are there any real benchmarks that compare blocking vs. asynchronous code? It can be any language, but I think C# and Java would be most representative. Not sure how good would this benchmark be with Node.js or similar.

Edit : My attempt at general question combined with unclear terminology seems to have failed. By "asynchronous code" I mean what some answers described as event or callback programming. In the final result, operations that would block thread are instead delegated to some callback system so threads can be utilized better.

And if I wanted to ask specific question : Are there any benchmarks that compare throughput/latency gain of async/await server code in .NET? Or any other similar comparison?

Best Answer

Based on your comments, it seems that you're really interested in "non-blocking IO." This differs from my definition of "asynchronous programming," which is an approach to decomposing work exemplified by Erlang processes or Goroutines.

And, if that is your definition, then yes, there have been benchmarks. But, like all benchmarks, they shouldn't be accepted blindly. Instead, you need to think about what goes on behind the scenes.

  • A thread is the unit of OS scheduling. Platforms such as Erlang and Go build their own schedulers on top of the OS scheduler, allowing multiple units of execution to share the same thread. This is great, as long as your units of execution are lightweight, because it avoids the overheads associated with threads.* However, IO operations require a trip to the kernel, which means that you need a real thread to do them. And if you're implementing a sub-thread scheduler, you need to be smart about not scheduling sub-thread tasks on a thread that's blocked in a kernel operation.
  • All IO operations have the potential to block.** When you make a read or write request, the kernel looks to see if there is data available or (for write) room in a buffer. If not, the kernel suspends the thread until the operation can complete. This makes thread-per-connection servers really simple to implement, but worries people who think about thread overheads.
  • Operating systems provide a way to block on multiple IO channels simultaneously. The select call on POSIX is one of these: you provide it with a list of channels (file descriptors / sockets) that you care about, and it will tell you when one of them is ready to read or write (read is what most people care about). You still have to make a kernel call, and you'll still end up blocking a thread if nothing's available, but that's only one thread. This is how Node.js works: when data is available, the proper event handler is called (I don't know the internals of Node, but hope that they also verify that write buffers are available before calling write).
  • When you max CPU, you're done. It doesn't matter whether you use select or a thread-per-connection approach, you still need to spend CPU to do whatever your server is meant to do. With the thread-per-connection approach, you don't really pay attention to that: the scheduler will assign threads to cores, and you'll degrade gracefully. With select, you will need to hand connections off to threads when they're ready for processing, or you'll be limited by the performance of a single core (Node.js gets around that by letting you spawn multiple servers).

As I said, benchmarks shouldn't be accepted blindly; they're only valid as long as they model the real-world problem that you're trying to solve. The author of the linked benchmark works for (worked for?) Mailinator, which if you haven't used it, is a poste restante service for ad hoc email addresses. Which means that it's going to be getting short-lived, high-activity connections from a relatively small number clients. This is a perfect use case for thread-per-connection scheduling. As noted in the comments, a chat server (long-lived, low activity) might be different.

In my mind, the question of blocking versus non-blocking IO is rather boring: most real-world servers don't have that many concurrent connections. More interesting to me is the programming model: a worker-based model like Erlang or Go means that you can focus on your business logic, and not care how connections are being managed.

* These overheads include kernel scheduling structures, and perhaps most important, a multi-megabyte thread stack, most of which goes unused. While 2MB doesn't seem like much, it adds up quickly if you have 100k processes ... which most applications don't have.

** Not 100% true, but I don't want to get too deep into the weeds here.