Design Pattern for Multi-Threaded URL Fetcher in Java

design-patternsjavamultithreading

I'm looking for hints and suggestions on a design for a multi-threaded URL fetcher in java. Specific requirements are:

To fetch each one of around 1,000 URLs periodically
The interval between each fetch will be URL specific
Intervals are likely to be 2 mins to 1 hour

I'm imagining I will need a bunch of fetchers each running in their own thread that get pushed the next URL to fetch when in a "ready" state.

I will need to handle errors, e.g quit querying a specific URL if it repeatedly times out or 404s.

Any ideas much appreciated.

Thanks

Best Answer

1000 threads (some of them active once in 2 hours) is a big no-no. So is starting a new thread for each job which may finish few seconds later. Make one "scheduler" thread that selects URLs for retrieval, and a number of worker threads that report their state to the scheduler. Scheduler sequentially:

performs worker thread pool management: -- if no threads are free, spawns some new ones. -- if more than X threads (say, 3) are idle, ends extra threads.
selects new URL to retrieve at the moment (or skip the step),
finds the first free thread, assigns it the job,
collects results from threads that finished (if any)

Then sleep and repeat the loop. Essentially, you have a semi-realtime parent thread that does all "fast" jobs and worker threads that have busy-wait states.

Of course the URL distribution can be done through Observer pattern, modified to "consume" the message if a "client" accepts it (hand out URL to retrieve). The list of threads can be a linked list to be traversed recursively.

Best Answer

Related Solutions

Design – Multi-threaded application design

C++ – Best Creational Pattern for loggers in a multi-threaded system

Related Topic