How to deploy Node.js in cloud for high availability using multi-core, reverse-proxy, and SSL

deploymentnode.js

I have posted this to ServerFault, but the Node.js community seems tiny there, so I'm hoping this bring more exposure.

I have a Node.js (0.4.9) application and am researching how to best deploy and maintain it. I want to run it in the cloud (EC2 or RackSpace) with high availability. The app should run on HTTPS. I'll worry about East/West/EU full-failover later.

I have done a lot of reading about keep-alive (Upstart, Forever), multi-core utilities (Fugue, multi-node, Cluster), and proxy/load balancers (node-http-proxy, nginx, Varnish, and Pound). However, I am unsure how to combine the various utilities available to me.

I have this setup in mind and need to iron out some questions and get feedback.

  1. Cluster is the most actively developed and seemingly popular multi-core utility for Node.js, so use that to run 1 node "cluster" per app server on non-privileged port (say 3000). Q1: Should Forever be used to keep the cluster alive or is that just redundant?
  2. Use 1 nginx per app server running on port 80, simply reverse proxying to node on port 3000. Q2: Would node-http-proxy be more suitable for this task even though it doesn't gzip or server static files quickly?
  3. Have minimum 2x servers as described above, with an independent server acting as a load balancer across these boxes. Use Pound listening 443 to terminate HTTPS and pass HTTP to Varnish which would round robin load balance across the IPs of servers above. Q3: Should nginx be used to do both instead? Q4: Should AWS or RackSpace load balancer be considered instead (the latter doesn't terminate HTTPS)

General Questions:

  1. Do you see a need for (2) above at all?
  2. Where is the best place to terminate HTTPS?
  3. If WebSockets are needed in the future, what nginx substitutions would you make?
  4. How do you deal with a single point of failure at the outer load balancer?

I'd really like to hear how people are setting up current production environments and which combination of tools they prefer. Much appreciated.

Best Answer

Six long years and nobody has ventured a response. Well, I've got a bit of hindsight to complement experience so I'll proffer one.

Q1. Maybe. If you don't mind adding the complexity of cluster to your app, and you're careful to avoid anything that might throw in the master process, then cluster works great. Otherwise, you certainly want something to handle supervising your node process and restarting it when your app crashes. Your OS might provide alternatives such as daemon or systemd.

Q2. No. At best, on a good day with the wind at its back, node-http-proxy is almost as good as nginx or haproxy. Excluding SSL where both haproxy and nginx are much better. It'd be awfully hard to build a case for it being more suitable.

Q3. Yes, or haproxy. Until you have the need to introduce varnish. When you get to that point, you won't have to wonder if you should be using varnish. (Or you'll use a CDN).

Q4. Your call. Haproxy is my default tool of choice for TLS termination and proxying. I don't hate myself enough to put something as critical as a load balancer on someone else's server where I can't run tcpdump or other troubleshooting tools.

  1. Yes. If you known nginx well, then use it to handle the HTTPS termination and proxying the requests to your app servers. If you aren't heavily into nginx already, consider haproxy instead. With a name like haproxy, you'd expect it to be really really good at HA and proxying and it doesn't disappoint.

  2. haproxy / nginx. Always. Better certificate management, listing at cipherli.st, etc. There's also less impact to your app to upgrade the proxy when openssl vulnerabilities are released.

  3. haproxy. (nginx supports proxying websockets now, so this question is past it's expiration date).

  4. Multiple sites and BGP. Introducing tools like keepalived or other peer-to-peer TCP failover mechanisms to your stack is just as likely to be the cause of an outage as to prevent one. The use of such tools is typically rare so the human with site knowledge of it is out of practice when the necessity arises. Keep the stack simpler and depend on the skills of your network team. They are well practiced at routing around problems.