Design pattern for streaming data

concurrencydesign-patterns

I have a use case where I'm asked to read an XML document that is a list of data, break it up into data sub-elements, and transform those sub-elements (in order) into another document format (like a flat file or JSON array).

I could solve the problem using a typical synchronous flow, first processing the entire XML document into relevant (Java) objects, and then process all the objects. This would ensure the order of the output is the same as the order of the input.

However I've been told there is a design pattern which fits this use case. My guess is that it's one of the Concurrency patterns, and so my feeling is that it could be implemented with a Queue instead of a Collection. The XML parser would take each set of data, parse it, and push it to the Queue, while another thread (or pool of threads) would pop elements off of the queue and process them to the output file.

I've not implemented this before so I have a number of questions, but first I'd like to know if I'm on the right track?

An additional use case is that the design should be able to handle multiple (shorter) XML as input from a web service. Each XML will contain one set of data, and there is no requirement around what order the documents need to be in the output, as long as the sub-elements from each set of data are in the right order.

(Edit) I'm not asking how to chose a design pattern in general. I'm asking which design pattern applies to this very specific use case.

Best Answer

What you've described is called is called fork-join queue or fork-join model.

From wikipedia:

The fork–join model is a way of setting up and executing parallel programs, such that execution branches off in parallel at designated points in the program, to "join" (merge) at a subsequent point and resume sequential execution.

You can implement it either with explicit queue or without. An explicit queue has the advantage that you may use a persistent queue and you can distribute the load over different machines.

In java (on one jvm without explicit queue) you can use either Java Fork-Join framework or since Java8 (parallel) Streams for that.

Related Topic