Java 8 – When to Use java.util.Stream Instead of java.util.Collection

functional programmingjavastream-processing

I started studying functional programming with JavaScript. After this, I started to study it with Java 8 (streams, lambdas and method reference) and I realised that I tend to use streams as much as possible, avoiding collections.

I heard that there is no sense to use streams when there are no events in the context. So, it makes me wonder:

What is the original problem that is intended to be resolved with streams?

I think that if I have the answer to this question, I could choose adequately when to use one or another.

Best Answer

Streams and collections are not alternatives, they serve completely different purposes.

java.util.Collection is an interface for representing a "collection of things", its primary purpose is to keep things, like an array in JavaScript.

java.util.Stream is an interface for processing things in multiple stages.

In most cases you will actually have both, a Collection containing things, and then you call .stream() on it to do some processing.

My guess is that the alternative to streams you actually mean is the "enhanced for loop":

for(String s: collectionInstance){
    doSomethingWith(s);
}

which is an alternative to

collectionInstance.stream().forEach(this::doSomethingWith);

And here the advantages of streams is:

the code can be cleaner by separating multiple operations and reducing intermediate variables. It's also a statement, so you can assign or return the collected result directly rather than having to create a target collection beforehand.
it can reduce memory usage by making it easier to avoid keeping full collections with intermediate results in memory.

But streams can also be used directly, not based on collections. You don't need to keep all elements in memory at all then. You can even have infinite streams that are generated on the fly which you stop processing at some point; try doing that with a collection!

The advantage of using the for loop is mainly that it's easier to debug, since there is less magic happening. It's also still more familiar for most developers.

Related Solutions

Functional Programming – Methods on Collections in Scala

You can roughly emulate map/filter by using foldRight (or foldLeft) with cons as the given reduction function. For example foldRight(f, L, initial) where L = [x,y,z] might be expanded to

f(x, f(y, f(z, initial)))

This means that you can't process x until you've processed f(z, initial) and then f(y, ...). You're creating a dependency that doesn't exist with map/filter.

As for map ordering...

(A) collection.map(a => a.map(b => ...))

(A) takes a collection and then applies the map function to each element, this implies that each element is a "mappable" collection. This inner map will then return the processed list (which is a list) and so each element of collection will remain a collection. This is how you would map a function onto each element of a list of lists, and the result would again be a list of lists.

(B) collection.map(a => ...).map(b => ...)

(B) processes each element of a list and forms those results into a new list. This list is then processed again with a second map function giving yet another list.

(A) is for processing the inner elements of a list ("sub-elements" if you like). (B) is for processing a list multiple times. If we write (B) concretely as

collection.map(a => f(a)).map(b => g(b))

we can see it is equivalent to

collection.map(a => g(f(a)))

It might help you to write them out as for loops. (A) will use embedded loops where as (B) will use two sequential loops.

This is not the difference between fold and unfold. Neither (A) nor (B) is a fold as the list structure present, however deeply nested, is preserved. Fold creates a scalar from a list, while unfold (not too familiar with it) takes a scalar and produces a list according to some rule.

EDIT: @Giorgio was right to suggest flatMap. It's an interesting variation on map. Let's say we're mapping a list of Xs to a list of Ys, so we pass map a function f:X->Y. Suppose we have another calculation g that takes an X but returns multiple Ys g:X->[Y]. In this case we would use flatMap. map takes the results of f and puts them into a list. flatMap takes the results of g and concatenates them together.

Ex. Say we have a list of lists L = [[1,2,3],[4,5,6]]

L.map(a => a.map(b => b * 2))

gives [[2,4,6],[8,10,12]]. But say we want each number doubled in one list, no sub-lists. Then we would call

L.flatMap(a => a.map(b => b * 2))

which gives [2,4,6,8,10,12]. Note that our inner function a => a.map(b => b * 2) takes and returns a list.

Uses of Persistent Data Structures in Non-Functional Languages

Persistent/immutable data structures don't solve concurrency problems on their own, but they make solving them much easier.

Consider a thread T1 that passes a set S to another thread T2. If S is mutable, T1 has a problem: It loses control of what happens with S. Thread T2 can modify it, so T1 can't rely at all on content of S. And vice versa - T2 can't be sure that T1 doesn't modify S while T2 operates on it.

One solution is to add some kind of a contract to the communication of T1 and T2 so that only one of the threads is allowed to modify S. This is error prone and burdens both the design and implementation.

Another solution is that T1 or T2 clone the data structure (or both of them, if they aren't coordinated). However, if S isn't persistent, this is an expensive O(n) operation.

If you have a persistent data structure, you're free of this burden. You can pass a structure to another thread and you don't have to care what it does with it. Both threads have access to the original version and can do arbitrary operations on it - it doesn't influence what the other thread sees.

Best Answer

Related Solutions

Functional Programming – Methods on Collections in Scala

Uses of Persistent Data Structures in Non-Functional Languages

Related Topic