Java Spliterator介绍 | Baeldung中文网

1. Overview

The Spliterator interface, introduced in Java 8, can traverse and partition sequences. It’s a base utility for Streams, especially parallel ones.

In this article, we’ll cover its usage, characteristics, methods and how to create our own custom implementations.

2. Spliterator API

2.1. tryAdvance

This is the main method used for stepping through a sequence. The method takes a Consumer that’s used to consume elements of the Spliterator one by one sequentially and returns false if there’re no elements to be traversed.

Here, we’ll look at how to use it to traverse and partition elements.

First, let’s assume that we’ve got an ArrayList with 35000 articles and that the Article class is defined as:

public class Article {
    private List<Author> listOfAuthors;
    private int id;
    private String name;
    
    // standard constructors/getters/setters
}

Now, let’s use Spliterator to process the list of articles and adds a suffix of “– published by Baeldung” to each article name:

@Test
public void givenAStreamOfArticles_whenProcessedInSequentiallyWithSpliterator_ProducessRightOutput() {
  // ...
}

First, let’s generate the articles:

public void givenAStreamOfArticles_whenProcessedInSequentiallyWithSpliterator_ProducessRightOutput() {
    List<Article> articles = Stream.generate(() -> new Article("Java"))
        .limit(35000)
        .collect(Collectors.toList());

    // ...
}

We have used Stream to generate 35000 articles. Next, let’s create a spliterator from this articles list and use the tryAdvance method to process the articles.

Spliterator<Article> spliterator = articles.spliterator();
while (spliterator.tryAdvance(article -> article.setName(article.getName()
    .concat("- published by Baeldung"))));

Here, our consumer is a simple function that adds a suffix to the article names.

Finally, we can do an assertion to verify if all articles were processed and their name was updated:

articles.forEach(article -> assertThat(article.getName()).isEqualTo("Java- published by Baeldung"));

Notice that this test case will execute successfully. All article names are already updated, and the new name is equal to Java- published by Baeldung.

Another key point is that we used the tryAdvance() method to process the next element.

2.2. trySplit

Next, let’s split Spliterators (hence the name) and process partitions independently.

The trySplit method tries to split it into two parts. Then the caller process elements, and finally, the returned instance will process the others, allowing the two to be processed in parallel.

We will generate our articles and spliterator as we did previously:

@Test
public void givenAStreamOfArticle_whenProcessedUsingTrySplit_thenSplitIntoEqualHalf() {
    List<Article> articles = Stream.generate(() -> new Article("Java"))
        .limit(35000)
        .collect(Collectors.toList());

    Spliterator<Article> split1 = articles.spliterator();
    
    // ...
}

Then we create our second spliterator by applying the trySplit method on the first one:

Spliterator<Article> split2 = split1.trySplit();

Now let’s check the example of using these two splits; let’s create two lists that will store the results processed by these spliterators:

List<Article> articlesListOne = new ArrayList<>(); 
List<Article> articlesListTwo = new ArrayList<>();

Let’s consume the articles:

split1.forEachRemaining(articlesListOne::add);
split2.forEachRemaining(articlesListTwo::add);

After creating the list, we iterate through split1 and add all the articles in split1 to articlesListOne. Similarly, we perform the same operation for split2, saving each article of split2 into articlesListTwo.

Next, we can assert that these spliterators consumed exactly half of the articles, i.e. 17500:

assertThat(articlesListOne.size()).isEqualTo(17500);
assertThat(articlesListTwo.size()).isEqualTo(17500);

Finally, we can make an assertion to verify that both lists contain distinct elements:

assertThat(articlesLitOne).doesNotContainAnyElementsOf(articlesListTwo);

Notice that this test case will execute successfully. As the articles that are present in the articlesSplitOne are not present in articlesSplitTwo. This concludes we can process the partitions independently.

The splitting process worked as intended and divided the records equally.

2.3. estimatedSize

The estimatedSize method gives us an estimated number of elements:

log.info("Size: " + split1.estimateSize());

This will output:

Size: 17500

2.4. hasCharacteristics

This API checks if the given characteristics match the properties of the Spliterator. Then if we invoke the method above, the output will be an int representation of those characteristics:

log.info("Characteristics: " + split1.characteristics());

Characteristics: 16464

3. Spliterator Characteristics

It has eight different characteristics that describe its behaviour. Those can be used as hints for external tools:

SIZED – if it’s capable of returning an exact number of elements with the estimateSize() method
SORTED – if it’s iterating through a sorted source
SUBSIZED – if we split the instance using a trySplit() method and obtain Spliterators that are SIZED as well
CONCURRENT – if the source can be safely modified concurrently
DISTINCT – if for each pair of encountered elements x, y, !x.equals(y)
IMMUTABLE – if elements held by the source can’t be structurally modified
NONNULL – if the source holds nulls or not
ORDERED – if iterating over an ordered sequence

4. A Custom Spliterator

4.1. When to Customize

For the sake of this example we will present an easy example to understand how to write a custom Splitter. Using a custom splitter you will be able to transverse all the elements of the source one by one. The type of this array could be a custom model object in an array, an IO channel or a generator function.

4.2. How to Customize

Let’s assume that we would like to compute the sum of all elements in a large Integer array using a custom splitter. To solve that, we need to implement a Spliterator that splits the Integer list into sublists. Here’s the implementation of our custom Spliterator:

public class CustomSpliterator implements Spliterator<Integer> {
    private final List<Integer> elements;
    private int currentIndex;
    
    public CustomSpliterator(List<Integer> elements) {
        this.elements = elements;
        this.currentIndex = 0;
    }
    
    @Override
    public boolean tryAdvance(Consumer<? super Integer> action) {
        if (currentIndex < elements.size()) {
            action.accept(elements.get(currentIndex));
            currentIndex++;
            return true;
        }
        return false;
    }
    
    @Override
    public Spliterator<Integer> trySplit() {
        int currentSize = elements.size() - currentIndex;
        if (currentSize < 2) {
            return null;
        }
        
        int splitIndex = currentIndex + currentSize / 2;
        CustomSpliterator newSpliterator = new CustomSpliterator(elements.subList(currentIndex, splitIndex));
        currentIndex = splitIndex;
        return newSpliterator;
    }
    
    @Override
    public long estimateSize() {
        return elements.size() - currentIndex;
    }
    
    @Override
    public int characteristics() {
        return ORDERED | SIZED | SUBSIZED | NONNULL;
    }
}

Testing the CustomSpliterator processing the collection sequential:

@Test
public void givenAStreamOfIntegers_whenProcessedSequentialCustomSpliterator_countProducesRightOutput() {
        List<Integer> numbers = new ArrayList<>();
        numbers.add(1);
        numbers.add(2);
        numbers.add(3);
        numbers.add(4);
        numbers.add(5);

        CustomSpliterator customSpliterator = new CustomSpliterator(numbers);

        AtomicInteger sum = new AtomicInteger();

        customSpliterator.forEachRemaining(sum::addAndGet);
        assertThat(sum.get()).isEqualTo(15);
}

Testing the CustomSpliterator processing the collection in paralel:

@Test
public void givenAStreamOfIntegers_whenProcessedInParallelWithCustomSpliterator_countProducesRightOutput() {
        List<Integer> numbers = new ArrayList<>();
        numbers.add(1);
        numbers.add(2);
        numbers.add(3);
        numbers.add(4);
        numbers.add(5);

        CustomSpliterator customSpliterator = new CustomSpliterator(numbers);

        // Create a ForkJoinPool for parallel processing
        ForkJoinPool forkJoinPool = ForkJoinPool.commonPool();

        AtomicInteger sum = new AtomicInteger(0);

        // Process elements in parallel using parallelStream
        forkJoinPool.submit(() -> customSpliterator.forEachRemaining(sum::addAndGet)).join();
        assertThat(sum.get()).isEqualTo(15);
}

By utilizing parallel processing, the elements are split into multiple parts and processed concurrently, potentially improving performance for large datasets or computationally intensive tasks.

Also, the custom Spliterator is created from a list of Integers and traverses through it by holding the current position.

Let’s discuss in more details the implementation of each method:

The CustomSpliterator takes a list of integers in its constructor and tracks the current index being processed.
The tryAdvance() method is implemented to consume the next element available and return true if the next element exists and advances the current index. If there are no more elements, it will return false.
The trySplit() method will split the remaining elements into two parts. It creates a new CustomSpliterator with the sub list from the current index to the split index. If the remaining size is too small to split trySplit() will return null.
The estimateSize() method returns an estimate of the remaining number of elements to be processed.
The characteristics() method specifies the characteristics of the Spliterator. In this case, the ORDERED, SIZED, SUBSIZED, and NONNULL characteristics are set.

5. Support for Primitive Values

The Spliterator* API supports primitive values including double, int and *long.

The only difference between using a generic and a primitive dedicated Spliterator is the given Consumer and the type of the Spliterator.

For example, when we need it for an int value we need to pass an intConsumer. Furthermore, here’s a list of primitive dedicated Spliterators:

OfPrimitive<T, T_CONS, T_SPLITR extends Spliterator.OfPrimitive<T, T_CONS, T_SPLITR>>: parent interface for other primitives
OfInt: A Spliterator specialized for int
OfDouble: A Spliterator dedicated for double
OfLong: A Spliterator dedicated for long

6. Conclusion

In this article, we covered Java 8 Spliterator usage, methods, characteristics, splitting process, primitive support and how to customize it.

As always, the full implementation of this article can be found over on Github.

Persistence

REST

Security