Fork/Join Framework vs Java 8 Streams API

Today I found an article about Java8's Fork/Join-Framework and its usage for the parallel streams implementation. While I do understand the article, I'm not entirely sure what I should think of it.

Basically what it says it that F/J in conjunction with streams is next to useless, and especially so in the context of JEE applications. Quite a few specific arguments are listed, such as:

it needs a massive volume of easily separable data (aggregate),

creates copious threads without regard for others,

has a high potential for stack overflows,

has a high potential for massive memory usage,

has a very, very narrow performance window,

is only designed for one request at a time.

Moreover, it gives these arguments against F/J's recursive decompostion approach:

Recursive decomposition has an even narrower performance window. In addition to the above dynamic decomposition, recursive decomposition optimized for dyadic recursive division only works well:

on balanced tree structures (Directed Acyclic Graphs)

where there are no cyclic dependencies

where the computation duration is neither too short nor too long

where there is no blocking.

Since this is the only source I could find which complains about FJ, I'm not sure if this can be taken seriously. Are the above-cited, or other similar points a valid concern?

More specifically, does Oracle have an official position regarding the limitations of the F/J Framework as applied to the parallelization of streams processing? If so, does it have plans to do something about them?

Best Answer

This is the gist of the problems with the application of F/J in the implementation of the Streams API:

F/J is good for the parallelization of in-memory, random-access structures: it wants to be able to divide the full problem top-down, by recursively halving it into two subproblems of equal size;
the stream paradigm is primarily about the processing of lazily materialized, sequential data sources, which can only be divided into a sequence of chunks, and the number of chunks is usually not known in advance.

While F/J can be bent somewhat to support sequential chunking, this is perceived by it as "anomalous" and "lopsided", eventually giving rise to insurmountable issues when combined with the unpredictable I/O latency in reading those chunks¹.

Streams API excels at the parallelization of in-memory structures and is usually helpful with the processing of lazy, I/O-backed streams, but it fails when you try to combine these two features in a single use case.

If you have a loop in your code which introduces a CPU-bound bottleneck, it is fairly likely that this loop is iterating over the contents of some file, network request, or rows of an SQL result set. None of these targets for parallelization get support from the Streams API.

The official position is that this use case is not supported because the Streams API has a different, equally legitimate focus. In the department of lazy parallel streams, this focus amounts to stream sources which are calculated from data existing within working memory, with the additional constraint that these sources must be unordered—that each member can be calculated independently, without the need to first calculate any other. An example of such a stream is a range of integers, but a stream of random numbers from an LCG is already outside of the area being focused on by the API, because these random numbers can only be generated sequentially.

¹Keep in mind that this is the official statement. I have personally not yet hit this issue, having instead successfully parallelized the processing of my I/O sources.