Java – What should be the best way to run a long operation in Java Spring

javaspringspring-mvc

I am working on a price comparator, one of the module of application crawls through the list of website against the product stored in the database and updates the price in the application database.

This crawler is a slow process and might take around 3 to 4 hours daily to complete. To test the crawling on my local I was running the service method from my Junit test.

What should be the best way to run these service after I host the application on production ?

Best Answer

You shouldn't be launching long running processes of any sort from a web app. It makes it difficult to do things like failover to another node in the cluster.

Instead, dispatch the "run this" to something outside of the webapp (and thus outside of Spring) so that the webapp can continue on with its life(cycle).

Instead of having a process (within the web container) doing the collecting, move this process to something outside of the web container. A free standing process that somehow gets a message from the webapp and then starts doing its thing.

This "send a message" could be done with either a message queue, or even just sticking a row in a database. The other process either listens for messages on the message queue (and then starts a worker thread) or periodically polls the database (and then starts a worker thread).

The worker thread then starts its processing of the page. One thing that the worker thread could then do is push new urls back into the message queue or database for the dispatcher to pick up and create new threads. You would likely want to use a thread pool for this to avoid overloading your system (there are N workers in the pool available - if none are available, you wait until one is). This would also be the location to make sure that you are properly throttling the requests to a single server (again, to make sure you aren't overloading their system).