Although this algorithm was designed to provide maximum throughput in all the scenarios, this is not correct for all situations. What if the processor has a lot of small processes? It will definitely lead to starvation. The turnaround time of small processes will be low, whereas that of large processes will be large as compared to other scheduling algorithms.
Wikipedia says:
Since turnaround time is based on waiting time plus processing time,
longer processes are significantly affected by this. Overall waiting
time is smaller than FIFO, however since no process has to wait for
the termination of the longest process.
If you schedule larger jobs first, lots of small processes would have to wait for the process to terminate (as this is non-preemptive). And since turnaround time is total time between submission of a process and its completion, this would definitely be low. But in SJF, very few large processes wait for some small processes, so turnaround time of only those large processes would be affected.
Check out this link for a comparison of various techniques.
Users can upload files and our application store those files in the local filesystem.
I can solve item 1 by storing all files in an external/shared storage (SAN, etc..)
Storing the common static resources on an external system that is accessible to all is certainly one way to approach the issue of users uploading files that you then have to save. This cuts down many problems and completely bypasses the ugly issue of trying to do synchronization of local file systems (don't go down that route, it only leads to madness).
There are scheduled jobs that might cause problems when executed concurrently (i.e. generating files).
With item 2, I can create a locking mechanism in the database so when the scheduled jobs run, the web apps will first check the table and only the first one to update will run the said job.
Dealing with scheduled jobs that write 'static' data, you get into some other ugly situations.
The first thing to consider is what are they generating? If they are generating other smallish files, why not store the generated data in the database instead. This also solves the 'locking' problem to a degree (in that it offloads it to the database). Then, the application would need to read the data out of the database instead of the file system.
When you go "ewww" when thinking about a heavy oracle database doing this, you might also consider a lighter nosql database. Using the file system for locking has other problems that make it a bit less desirable. Using multiple different systems (database and file system) doesn't quite have the same race conditions (they are other ones to consider), but it also means the two need to stay properly in sync. The joys of removing database locks when the application terminates improperly.
Store the generated content in the database, and you should be good. It makes several things much easier. Consider things such as records with timestamps and then you just select the most recent time stamp that is done.
There are other things to think of here, such as having a service on the application servers that will write a file that comes in from a specific client (so your scheduled job pushes the to the application server which writes it to the common file system), or having some messaging system that notifies the application servers of new content in a certain location that triggers an application to pull the data to a local file system.
There are lots of different ways to approach this for single updates that get pushed to the content area. This could be the subject of a whole new question itself depending on the specifics of the problem and the constraints of the system.
Session variables are heavily used in most modules.
Actually, I'm still not sure if item 3 can cause problems. If a user logs in to our application and the load balancer directs him to Server 1, is it possible that the load balancer will point him to Server 2 the next time he click a link? I have no idea yet how the load balancer works in this level.
The key concept you are looking for there is 'sticky session'. It falls under the area of 'load balancer persistance'
In this, the load balancer also is aware of the sessions created and when a session request comes through for one system it will always keep it 'stuck' to the application server that generated it.
This isn't without its own set of issues (one system goes down, the sessions switch to the other one (you've got failover, right?) and then comes back up, but now all the load is stuck to the other server).
Realize that the exact nature of this support is dependent on the load balancer being used. Some have their own cookie, some use JSESSIONID and the like, others use ip (and if you've got a firewall/proxy of some sort there, it may look like all the requests come from the same ip).
You may also want to look at an external cache for sessions so they are shared between both machines rather than doing load balancing between them. Some frameworks have this, or you might look at other caching solutions to go down this route. Again, possibly a topic of a new question if there is more information on the environment and the constraints as this is getting reasonably long too.
Best Answer
Do you have a shared database? I've done this using a database as the arbiter in the past.
Basically, each "job" is represented as a row in the database. You schedule a job by adding a row to the database with the time you want it to run then each server does:
That way, they'll all pick the job that is scheduled to run next. They all sleep so that they wake up when the job is actually supposed to run. Then, they all do this:
Where
:id
is the identifier of the job you got in the step above. Because the update is atomic, only one of the servers will actually update the row, you can check the database's "number of rows updates" status code to determine whether you were the server that actually updated the row, and therefore whether you are the server that gets to run the job.If you didn't "win" and you're not running the job, just go back to step 1 immediately. If you did "win", schedule the job to execute in another thread, then wait a couple of seconds before going back to step 1. That way, servers that didn't get the job this time are more likely to pick up a job that's scheduled to run immediately.