What are the challenges in building a scalable real-time web app

node.jsreal timescalability

I'm looking into real-time web applications using websockets and node.js. I'm interested to see what are some technical challenges with scaling such a setup.

One such problem I've heard is that each socket requires a Unix file descriptor, and epoll/select takes time linear to the number of open file descriptors.

Anyone have other insights into scaling?

Best Answer

First things first:

There are two approaches to all http servers, thread/process or asynchronous and persistent.

Twitter is an example of what to avoid in my opinion - fail whale and all. Although they are getting better now it seems - the front-end is still not persistent... you can read about its evolution here: http://highscalability.com/scaling-twitter-making-twitter-10000-percent-faster

The first thing you are going to need to acknowledge is you aren't going to be able to accomplish scalable real-time anything without TRUE persistence. As in a main-loop or event-loop. If you have any sizable amount of code at all, PHP, Perl, etc - must re-execute and reload all of their data, variables, etc every request - although fine for displaying your Wordpress blog, this is not going to work in the type of application you are describing.

If you have enough servers and enough money you can use apache2/httpd to serve all of the content in real time you want. But if you are like the rest of us the asynchronous approach is probably going to work best for you.

Existing Technologies:

Tornado - Facebook adopted the Tornado web-server to handle event notifications/feed updates. (I'm not advocating it, just giving you an example of how to handle the real-time solution.

libevent

If you want to write the kind of http servers that amazon.com is engineering then you might look at: libevent http://monkey.org/~provos/libevent/

twisted

If you have a little less time on your hands, I would suggest Python's twisted framework. We use Twisted at my work for a great many things.

http://justin.tv runs on Twisted - that's about as realtime web as it gets.

See here: http://twistedmatrix.com/trac/wiki/SuccessStories

Related Topic