Java – I need help designing a long running job processing server

java

I'm about to propose a new architecture for several pieces of the system I'm working on; it's going to entail getting a lot of number crunching and data translation jobs off of tomcat, and on a more fitting platform and I need a little advice.

First, the way this system is currently setup: There is an AS400 and a PostgreSQL database; some of those tables need to be synchronized at various points in the year. An administration application running on Tomcat6 would have a button, "Synchronize" and the struts action class will spawn a new thread to run the job. The job can take anywhere from 15 seconds to 4 days (this is the maximum time we have experienced so far).

I do not like this design. Tomcat is a web server, not a job processing server.

So, the solution I have in mind is to take one of our unused servers and put Jboss 7 on it. then have a simple interface on that job processing server (e.g. web, JMS) that would listen for triggers. when someone clicks on the "synchronize" button on the web page, the job server is sent the signal telling it what job to fire, what parameters are included and to start processing the job.

Years ago I designed a similar concept in job processing where I had a JMS queue that would listen for jobs to come in. when a certain task was added to the queue the system would fire off and run that task. But on that system the jobs would never take more than 5 or 6 seconds, not several days so I'm not sure that JMS Queues would be the best fit here. I looked at Quartz but thats more of a recurring task scheduling system. Although I know I can code around that and make it work, I'm still wondering if thats the best tech for the problem.

Do you have any suggestion?

Best Answer

At a previous job, we used what I jokingly called "redundant array of inexpensive computers" {1} to distribute a number of jobs. Some took seconds to run, others took more than a day (quarterly reports). All were designed to be capable of being restarted if they crashed. Some jobs ran on a schedule, some were run only on demand. The schedule was sophisticated enough to include things as simple as "run daily" or "run monthly" or as complicated as "run monthly, except for the last week of the quarter run daily". Any of the regularly scheduled tasks could be run "now" (on demand).

Because the array {2} lacked displays, the "scoreboard" of who was running what job was read from a database table. In order to determine if a job crashed or hung, I took a concept from embedded computers called a "watch dog". In this case, a running job would periodically update a status column with a 0. The monitoring software would periodically increment that status column, and if the number got above a threshold in a certain period of time, would send a message to a person to basically reboot machine X. The idea was to put this "heart beat" at the end of a loop, but not the innermost loop. I planned to have the "pulse" go off about every minute with the checking circuit checking every 5 or 10 minutes.

Notes:
1 - Folks might now call this a "grid" or "cluster", but they were a pile of Pentium 75 and 100 that hadn't been depreciated, so I could use them.
2 - Well, a 1 dimensional array, as they stacked reasonably well and were worthless enough that if they fell over and broke, no one would miss them. They hadn't been fully depreciated yet, so technically could not be discarded, but were slow enough that they were useless as desktop computers.

Related Topic