Distributed Computing – Building a Redundant Application

distributed computingdistributed-development

This is more of a "point me in the right direction" question.

My team of three and I have built a hosted web app that queues and routes customer chat requests to available customer service agents (It does other things as well, but this is enough background to illustrate the issue).

The basic dev architecture today is:

  • a single page ajax web UI (ASP.NET MVC) with floating chat windows (think Gmail)
  • a backend Windows service to queue and route the chat requests
    • this service also logs the chats, calculates service levels, etc
  • a Comet server product that routes data between the web frontend and the backend
    Windows service

    • this also helps us detect which Agents are still connected
      (online)

And our hardware architecture today is:

  • 2 servers to host the web UI portion of the application
  • a load balancer to route requests to the 2 different web app servers
  • a third server to host the SQL Server DB and the backend Windows service responsible for queuing / delivering chats

enter image description here

So as it stands today, one of the web app servers could go down and we would be ok. However, if something would happen to the SQL Server / Windows Service server we would be boned.

My question – how can I make this backend Windows service logic be able to be spread across multiple machines (distributed)? The Windows service is written to accept requests from the Comet server, check for available Agents, and route the chat to those agents. How can I make this more distributed? How can I make it so that I can distribute the work of the backend Windows service can be spread across multiple machines for redundancy and uptime purposes? Will I need to re-write it with distributed computing in mind?

I should also note that I am hosting all of this on Rackspace Cloud instances – so maybe it is something I should be less concerned about?

Thanks in advance for any help!

Best Answer

  1. Make sure the Windows Service is loosely coupled to the instance of the DB server. This will allow you to move to a N:1 ratio of Windows Service : DB server. There's a whole host of techniques that can be used to make your DB server more robust, but that's not really what you're getting at in your Q.

  2. Isolate the following information:

    • What data is required for the Windows Service to function
    • How would the Windows Service act if it didn't have an immediate feed of that information
    • How would the Windows Service share that information with other instances of itself
  3. Identify what is dependent upon the Windows Service and why. How would those elements react if a load balancer was inserted between them and the Windows Service? What needs to change in order for those elements to play nicely with a load balancer?

  4. Start analyzing what might happen to existing chats and incoming chat requests should the Windows Service go down. Ideally, all you would lose on an existing chat is the logging information. Incoming chats would be routed to a different Windows Service instance.

Ultimately, the answer to your question is identifying the assumptions and requirements that are binding your layers together. Loosen those requirements / make the layers more independent and you'll be on your way to scaling and distributing your application.

I'm assuming for your mainline path that only the Windows Service interacts directly with the DB server. If that's not true, you'll need to consider if you want to continue with that model or change it.

Related Topic