过滤Java流以获取单个且唯一元素

1. 概述

在这篇文章中，我们将使用 Java 8 中的两个收集器方法，从元素流中获取满足特定谓词的唯一元素。

对于这两种方法，我们将根据以下标准定义两个方法：

get 方法期望有一个唯一的返回结果。否则，它将抛出异常。
find 方法接受结果可能缺失的情况，并返回一个包含值的 Optional，如果存在的话。

2. 使用归约获取唯一结果

Collectors.reducing 用于对输入元素进行归约操作。它应用一个指定的二元操作符，结果描述为 Optional。因此，我们可以定义我们的 find 方法。

在我们的例子中，如果过滤后有多个元素，我们只需要忽略结果：

public static <T> Optional<T> findUniqueElementMatchingPredicate_WithReduction(Stream<T> elements, Predicate<T> predicate) {
    return elements.filter(predicate)
      .collect(Collectors.reducing((a, b) -> null));
}

为了编写 get 方法，我们需要做以下修改：

如果检测到两个元素，我们可以直接抛出，而不是返回 null。
最终，我们需要获取 Optional 的值：如果它是空的，我们也想抛出异常。

此外，在这种情况下，我们可以直接在流上应用归约操作：

public static <T> T getUniqueElementMatchingPredicate_WithReduction(Stream<T> elements, Predicate<T> predicate) {
    return elements.filter(predicate)
      .reduce((a, b) -> {
          throw new IllegalStateException("Too many elements match the predicate");
      })
      .orElseThrow(() -> new IllegalStateException("No element matches the predicate"));
}

3. 使用 `Collectors.collectingAndThen` 获取唯一结果

Collectors.collectingAndThen 应用一个函数到收集操作的结果列表上。

因此，为了定义 find 方法，我们需要获取列表并：

如果列表中有零个或超过两个元素，返回 null。
如果列表恰好有一个元素，返回它。

以下是这个操作的代码：

private static <T> T findUniqueElement(List<T> elements) {
    if (elements.size() == 1) {
        return elements.get(0);
    }
    return null;
}

因此，find 方法的实现如下：

public static <T> Optional<T> findUniqueElementMatchingPredicate_WithCollectingAndThen(Stream<T> elements, Predicate<T> predicate) {
    return elements.filter(predicate)
      .collect(Collectors.collectingAndThen(Collectors.toList(), list -> Optional.ofNullable(findUniqueElement(list))));
}

为了适应 get 方法的私有方法，我们需要在获取的元素数量不为 1 时抛出异常。让我们精确地区分没有结果和结果过多的情况，就像我们处理归约一样：

private static <T> T getUniqueElement(List<T> elements) {
    if (elements.size() > 1) {
        throw new IllegalStateException("Too many elements match the predicate");
    } else if (elements.size() == 0) {
        throw new IllegalStateException("No element matches the predicate");
    }
    return elements.get(0);
}

最后，由于我们命名为 FilterUtils，我们可以写出 get 方法：

public static <T> T getUniqueElementMatchingPredicate_WithCollectingAndThen(Stream<T> elements, Predicate<T> predicate) {
    return elements.filter(predicate)
      .collect(Collectors.collectingAndThen(Collectors.toList(), FilterUtils::getUniqueElement));
}

4. 性能基准

让我们使用 JMH 进行不同方法之间的快速性能比较。

首先，让我们将我们的方法应用到：

包含从 1 到 100 万的所有 Integers 的 Stream。
验证元素是否等于 751879 的 Predicate。

在这种情况下，Predicate 将只对流中的唯一元素进行验证。让我们看看 Benchmark 的定义：

@State(Scope.Benchmark)
public static class MyState {
    final Stream<Integer> getIntegers() { 
        return IntStream.range(1, 1000000).boxed();
    }
    
    final Predicate<Integer> PREDICATE = i -> i == 751879;
}

@Benchmark
public void evaluateFindUniqueElementMatchingPredicate_WithReduction(Blackhole blackhole, MyState state) {
    blackhole.consume(FilterUtils.findUniqueElementMatchingPredicate_WithReduction(state.INTEGERS.stream(), state.PREDICATE));
}

@Benchmark
public void evaluateFindUniqueElementMatchingPredicate_WithCollectingAndThen(Blackhole blackhole, MyState state) {
    blackhole.consume(FilterUtils.findUniqueElementMatchingPredicate_WithCollectingAndThen(state.INTEGERS.stream(), state.PREDICATE));
}

@Benchmark
public void evaluateGetUniqueElementMatchingPredicate_WithReduction(Blackhole blackhole, MyState state) {
    try {
        FilterUtils.getUniqueElementMatchingPredicate_WithReduction(state.INTEGERS.stream(), state.PREDICATE);
    } catch (IllegalStateException exception) {
        blackhole.consume(exception);
    }
}

@Benchmark
public void evaluateGetUniqueElementMatchingPredicate_WithCollectingAndThen(Blackhole blackhole, MyState state) {
    try {
        FilterUtils.getUniqueElementMatchingPredicate_WithCollectingAndThen(state.INTEGERS.stream(), state.PREDICATE);
    } catch (IllegalStateException exception) {
        blackhole.consume(exception);
    }
}

运行它。我们正在测量每秒的操作数。数值越高，性能越好：

Benchmark                                                                          Mode  Cnt    Score    Error  Units
BenchmarkRunner.evaluateFindUniqueElementMatchingPredicate_WithCollectingAndThen  thrpt   25  140.581 ± 28.793  ops/s
BenchmarkRunner.evaluateFindUniqueElementMatchingPredicate_WithReduction          thrpt   25  100.171 ± 36.796  ops/s
BenchmarkRunner.evaluateGetUniqueElementMatchingPredicate_WithCollectingAndThen   thrpt   25  145.568 ±  5.333  ops/s
BenchmarkRunner.evaluateGetUniqueElementMatchingPredicate_WithReduction           thrpt   25  144.616 ± 12.917  ops/s

如我们所见，在这种情况下，不同的方法性能相当接近。

让我们改变我们的 Predicate，检查流中的元素是否等于 0。这个条件对列表中的所有元素都为假。现在我们可以再次运行基准测试：

Benchmark                                                                          Mode  Cnt    Score    Error  Units
BenchmarkRunner.evaluateFindUniqueElementMatchingPredicate_WithCollectingAndThen  thrpt   25  165.751 ± 19.816  ops/s
BenchmarkRunner.evaluateFindUniqueElementMatchingPredicate_WithReduction          thrpt   25  174.667 ± 20.909  ops/s
BenchmarkRunner.evaluateGetUniqueElementMatchingPredicate_WithCollectingAndThen   thrpt   25  188.293 ± 18.348  ops/s
BenchmarkRunner.evaluateGetUniqueElementMatchingPredicate_WithReduction           thrpt   25  196.689 ±  4.155  ops/s

在这里，性能图也相当平衡。

最后，让我们看看如果我们使用一个对大于 751879 的值返回 true 的 Predicate 会发生什么。流中有大量元素匹配这个 Predicate。这导致了以下基准测试结果：

Benchmark                                                                          Mode  Cnt    Score    Error  Units
BenchmarkRunner.evaluateFindUniqueElementMatchingPredicate_WithCollectingAndThen  thrpt   25   70.879 ±  6.205  ops/s
BenchmarkRunner.evaluateFindUniqueElementMatchingPredicate_WithReduction          thrpt   25  210.142 ± 23.680  ops/s
BenchmarkRunner.evaluateGetUniqueElementMatchingPredicate_WithCollectingAndThen   thrpt   25   83.927 ±  1.812  ops/s
BenchmarkRunner.evaluateGetUniqueElementMatchingPredicate_WithReduction           thrpt   25  252.881 ±  2.710  ops/s

如我们所见，使用归约的变体更有效。此外，直接在过滤后的流上使用 reduce 更好，因为一旦找到两个匹配的值，异常就会立即抛出。

总之，如果性能是关键因素：

应优先使用归约。
如果我们预期会找到大量潜在的匹配值，那么减少流的 get 方法会更快。

5. 总结

在这个教程中，我们看到了从流中过滤后获取唯一结果的不同方法，并比较了它们的效率。

如往常一样，代码可在 GitHub 上找到。

Persistence

REST

Security

1. 概述

2. 使用归约获取唯一结果

3. 使用 Collectors.collectingAndThen 获取唯一结果

4. 性能基准

5. 总结

3. 使用 `Collectors.collectingAndThen` 获取唯一结果