1. Overview
In this tutorial, we’ll dive into how different uses of the Java Stream API affect the order in which a stream generates, processes, and collects data.
We’ll also look at how ordering influences performance.
2. Encounter Order
Simply put, encounter order is the order in which a Stream encounters data.
2.1. Encounter Order of Collection Sources
The Collection we choose as our source affects the encounter order of the Stream.
To test this, let’s simply create two streams.
Our first is created from a List, which has an intrinsic ordering.
Our second is created from a TreeSet which doesn’t.
We then collect the output of each Stream into an Array to compare the results.
@Test
public void givenTwoCollections_whenStreamedSequentially_thenCheckOutputDifferent() {
List<String> list = Arrays.asList("B", "A", "C", "D", "F");
Set<String> set = new TreeSet<>(list);
Object[] listOutput = list.stream().toArray();
Object[] setOutput = set.stream().toArray();
assertEquals("[B, A, C, D, F]", Arrays.toString(listOutput));
assertEquals("[A, B, C, D, F]", Arrays.toString(setOutput));
}
As we can tell from our example, the TreeSet hasn’t kept the order of our input sequence, therefore, scrambling the encounter order of the Stream.
If our Stream is ordered, it doesn’t matter whether our data is being processed sequentially or in parallel; the implementation will maintain the encounter order of the Stream.
When we repeat our test using parallel streams, we get the same result:
@Test
public void givenTwoCollections_whenStreamedInParallel_thenCheckOutputDifferent() {
List<String> list = Arrays.asList("B", "A", "C", "D", "F");
Set<String> set = new TreeSet<>(list);
Object[] listOutput = list.stream().parallel().toArray();
Object[] setOutput = set.stream().parallel().toArray();
assertEquals("[B, A, C, D, F]", Arrays.toString(listOutput));
assertEquals("[A, B, C, D, F]", Arrays.toString(setOutput));
}
2.2. Removing Order
At any point, we can explicitly remove the order constraint with the unordered method.
For example, let’s declare a TreeSet:
Set<Integer> set = new TreeSet<>(
Arrays.asList(-9, -5, -4, -2, 1, 2, 4, 5, 7, 9, 12, 13, 16, 29, 23, 34, 57, 102, 230));
And if we stream without calling unordered:
set.stream().parallel().limit(5).toArray();
Then TreeSet‘s natural order is preserved:
[-9, -5, -4, -2, 1]
But, if we explicitly remove ordering:
set.stream().unordered().parallel().limit(5).toArray();
Then the output is different:
[1, 4, 7, 9, 23]
The reason is two-fold: First, since sequential streams process the data one element at a time, unordered has little effect on its own. When we called parallel, too, however, we affected the output.
3. Intermediate Operations
We can also affect stream ordering through intermediate operations.
While most intermediate operations will maintain the order of the Stream, some will, by their nature, change it.
For example, we can affect the stream ordering by sorting:
@Test
public void givenUnsortedStreamInput_whenStreamSorted_thenCheckOrderChanged() {
List<Integer> list = Arrays.asList(-3, 10, -4, 1, 3);
Object[] listOutput = list.stream().toArray();
Object[] listOutputSorted = list.stream().sorted().toArray();
assertEquals("[-3, 10, -4, 1, 3]", Arrays.toString(listOutput));
assertEquals("[-4, -3, 1, 3, 10]", Arrays.toString(listOutputSorted));
}
unordered and empty are two more examples of intermediate operations that will ultimately change the ordering of a Stream.
4. Terminal Operations
Finally, we can affect the order depending on the terminal operation that we use.
4.1. ForEach vs ForEachOrdered
ForEach and ForEachOrdered may seem to provide the same functionality, but they have one key difference: ForEachOrdered guarantees to maintain the order of the Stream.
If we declare a list:
List<String> list = Arrays.asList("B", "A", "C", "D", "F");
And use forEachOrdered after parallelizing:
list.stream().parallel().forEachOrdered(e -> logger.log(Level.INFO, e));
Then the output is ordered:
INFO: B
INFO: A
INFO: C
INFO: D
INFO: F
However, if we use forEach:
list.stream().parallel().forEach(e -> logger.log(Level.INFO, e));
Then the output is unordered:
INFO: C
INFO: F
INFO: B
INFO: D
INFO: A
ForEach logs the elements in the order they arrive from each thread. The second Stream with its ForEachOrdered method waits for each previous thread to complete before calling the log method.
4.2. Collect
When we use the collect method to aggregate the Stream output, it’s important to note that the Collection we choose will impact the order.
For example, inherently unordered Collections such as TreeSet won’t obey the order of the Stream output:
@Test
public void givenSameCollection_whenStreamCollected_checkOutput() {
List<String> list = Arrays.asList("B", "A", "C", "D", "F");
List<String> collectionList = list.stream().parallel().collect(Collectors.toList());
Set<String> collectionSet = list.stream().parallel()
.collect(Collectors.toCollection(TreeSet::new));
assertEquals("[B, A, C, D, F]", collectionList.toString());
assertEquals("[A, B, C, D, F]", collectionSet.toString());
}
When running our code, we see that the order of our Stream changes by collecting into a Set.
*4.3. Specifying Collections*
In the case we collect to an unordered collection using, say, Collectors.toMap, we can still enforce ordering by changing the implementation of our Collectors methods to use the Linked implementation.
First, we’ll initialize our list, along with the usual 2-parameter version of the toMap method:
@Test
public void givenList_whenStreamCollectedToHashMap_thenCheckOrderChanged() {
List<String> list = Arrays.asList("A", "BB", "CCC");
Map<String, Integer> hashMap = list.stream().collect(Collectors
.toMap(Function.identity(), String::length));
Object[] keySet = hashMap.keySet().toArray();
assertEquals("[BB, A, CCC]", Arrays.toString(keySet));
}
As expected, our new H**ashMap hasn’t kept the original ordering of the input list, but let’s change that.
With our second Stream, we’ll use the 4-parameter version of the toMap method to tell our supplier to supply a new LinkedHashMap:
@Test
public void givenList_whenCollectedtoLinkedHashMap_thenCheckOrderMaintained(){
List<String> list = Arrays.asList("A", "BB", "CCC");
Map<String, Integer> linkedHashMap = list.stream().collect(Collectors.toMap(
Function.identity(),
String::length,
(u, v) -> u,
LinkedHashMap::new
));
Object[] keySet = linkedHashMap.keySet().toArray();
assertEquals("[A, BB, CCC]", Arrays.toString(keySet));
}
Hey, that’s much better!
We’ve managed to keep the original order of the list by collecting our data to a LinkedHashMap.
5. Performance
If we’re using sequential streams, the presence or absence of order makes little difference to the performance of our program. Parallel streams, however, can be heavily affected by the presence of an ordered Stream.
The reason for this is that each thread must wait for the computation of the previous element of the Stream.
Let’s try and demonstrate this using the Java Microbenchmark harness, JMH, to measure the performance.
In the following examples, we’ll measure the performance cost of processing ordered and unordered parallel streams with some common intermediate operations.
5.1. Distinct
Let’s set up a test using the distinct function on both ordered and unordered streams.
@Benchmark
public void givenOrderedStreamInput_whenStreamDistinct_thenShowOpsPerMS() {
IntStream.range(1, 1_000_000).parallel().distinct().toArray();
}
@Benchmark
public void givenUnorderedStreamInput_whenStreamDistinct_thenShowOpsPerMS() {
IntStream.range(1, 1_000_000).unordered().parallel().distinct().toArray();
}
When we hit run, we can see the disparity in the time taken per operation:
Benchmark Mode Cnt Score Error Units
TestBenchmark.givenOrdered... avgt 2 222252.283 us/op
TestBenchmark.givenUnordered... avgt 2 78221.357 us/op
5.2. Filter
Next, we’ll use a parallel Stream with a simple filter method to return every 10th integer*:*
@Benchmark
public void givenOrderedStreamInput_whenStreamFiltered_thenShowOpsPerMS() {
IntStream.range(1, 100_000_000).parallel().filter(i -> i % 10 == 0).toArray();
}
@Benchmark
public void givenUnorderedStreamInput_whenStreamFiltered_thenShowOpsPerMS(){
IntStream.range(1,100_000_000).unordered().parallel().filter(i -> i % 10 == 0).toArray();
}
Interestingly, the difference between our two streams is much less than when using the distinct method.
Benchmark Mode Cnt Score Error Units
TestBenchmark.givenOrdered... avgt 2 116333.431 us/op
TestBenchmark.givenUnordered... avgt 2 111471.676 us/op
6. Conclusion
In this article, we looked at the ordering of streams, focusing on the different stages of the Stream process and how each one has its own effect.
Finally, we saw how the order contract placed on a Stream can affect the performance of parallel streams.
As always, check out the full sample set over on GitHub.