Web Application Design – Considerations for Deployment Under a Load Balancer

Architecturedesignjava

I am currently maintaining a Java web application that is initially designed to work only as a single instance (not in a cluster/farm). Now, the client is planning to upgrade their infrastructure and part of their plan is to place our web application behind a load balancer.

I have already identified several serious design problems:

Users can upload files and our application store those files in the local filesystem.
There are scheduled jobs that might cause problems when executed concurrently (i.e. generating files).
Session variables are heavily used in most modules.

I need opinion on my proposed solutions below.

I can solve item 1 by storing all files in an external/shared storage (SAN, etc..)
With item 2, I can create a locking mechanism in the database so when the scheduled jobs run, the web apps will first check the table and only the first one to update will run the said job.
Actually, I'm still not sure if item 3 can cause problems. If a user logs in to our application and the load balancer directs him to Server 1, is it possible that the load balancer will point him to Server 2 the next time he click a link? I have no idea yet how the load balancer works in this level.

Also, what are the other things that I should consider when designing a horizontal-scalable web application [from scratch]?

Best Answer

Users can upload files and our application store those files in the local filesystem.
I can solve item 1 by storing all files in an external/shared storage (SAN, etc..)

Storing the common static resources on an external system that is accessible to all is certainly one way to approach the issue of users uploading files that you then have to save. This cuts down many problems and completely bypasses the ugly issue of trying to do synchronization of local file systems (don't go down that route, it only leads to madness).

There are scheduled jobs that might cause problems when executed concurrently (i.e. generating files).
With item 2, I can create a locking mechanism in the database so when the scheduled jobs run, the web apps will first check the table and only the first one to update will run the said job.

Dealing with scheduled jobs that write 'static' data, you get into some other ugly situations.

The first thing to consider is what are they generating? If they are generating other smallish files, why not store the generated data in the database instead. This also solves the 'locking' problem to a degree (in that it offloads it to the database). Then, the application would need to read the data out of the database instead of the file system.

When you go "ewww" when thinking about a heavy oracle database doing this, you might also consider a lighter nosql database. Using the file system for locking has other problems that make it a bit less desirable. Using multiple different systems (database and file system) doesn't quite have the same race conditions (they are other ones to consider), but it also means the two need to stay properly in sync. The joys of removing database locks when the application terminates improperly.

Store the generated content in the database, and you should be good. It makes several things much easier. Consider things such as records with timestamps and then you just select the most recent time stamp that is done.

There are other things to think of here, such as having a service on the application servers that will write a file that comes in from a specific client (so your scheduled job pushes the to the application server which writes it to the common file system), or having some messaging system that notifies the application servers of new content in a certain location that triggers an application to pull the data to a local file system.

There are lots of different ways to approach this for single updates that get pushed to the content area. This could be the subject of a whole new question itself depending on the specifics of the problem and the constraints of the system.

Session variables are heavily used in most modules.
Actually, I'm still not sure if item 3 can cause problems. If a user logs in to our application and the load balancer directs him to Server 1, is it possible that the load balancer will point him to Server 2 the next time he click a link? I have no idea yet how the load balancer works in this level.

The key concept you are looking for there is 'sticky session'. It falls under the area of 'load balancer persistance'

In this, the load balancer also is aware of the sessions created and when a session request comes through for one system it will always keep it 'stuck' to the application server that generated it.

This isn't without its own set of issues (one system goes down, the sessions switch to the other one (you've got failover, right?) and then comes back up, but now all the load is stuck to the other server).

Realize that the exact nature of this support is dependent on the load balancer being used. Some have their own cookie, some use JSESSIONID and the like, others use ip (and if you've got a firewall/proxy of some sort there, it may look like all the requests come from the same ip).

You may also want to look at an external cache for sessions so they are shared between both machines rather than doing load balancing between them. Some frameworks have this, or you might look at other caching solutions to go down this route. Again, possibly a topic of a new question if there is more information on the environment and the constraints as this is getting reasonably long too.

Best Answer

Related Solutions

Scheduling – How to Make a Cluster Run a Task Only Once

Related Topic