Linux – Deploying Python web (Tornado) applications to multiple servers

deploymentlinuxpython

We have 4 application servers and a load balance running one of our Python applications. Each of the application servers has 32 hyper-threaded cores, so Tornado deployment guides suggest we run 64 threads on each. We also use supervisord to manage all the threads. This works fine, the issue we have is when we have to deploy updates, the current process for deploying a new applications is a shell script which does the following;

  • Checkout the /deploy branch of our GIT repo
  • (Some unrelated stuff to do with CDNs)
  • SCP the files to each of the 4 servers
  • Restart supervisord (So the applications load the new code)

This is terribly inefficient and takes about 20 seconds in total. Restarting the individual tornado threads takes about a second but the issue is if we make any big changes, the load balancer will switch between the old and the new application depending on which stage of the reboot the thread it picks is (In total there are 256 possible instances the load balancer can connect to) so we have to take the site down for 30 second, sometimes longer, to get the correct versions of the application.

Are there any better ways of doing it? I've heard about Fabric and some other tools that can be used but are they any more effective than the way we're doing it at the moment? Ideally we need to reboot all the threads to the new version within 5 seconds, even if it involves taking the site down temporally.

Information (If it's at all useful);
All the servers are RHEL 5.5, the load balancer is a Barracuda 640.

Best Answer

The following sequence should do what you want if you can use the load balancer's API in your deploy script:

  1. Remove some portion of your threads from the load balancer
  2. Upgrade those threads
  3. Remove the active threads from the load balancer
  4. Add the upgraded threads back into the load balancer
  5. Upgrade remaining threads
  6. Add remaining threads back into the load balancer

That way you only have one version of code live at any moment AND downtime should be limited to a second or two while the pool changes take affect.

Disclaimer: This assumes that the Barracuda load balancer has a decent API. I couldn't find the documentation with a quick Google. The pattern should work. I've done it in a similar situation with a Cisco load balancer.

Related Topic