1. Overview

HashSet is a collection for storing unique elements.

In this tutorial, we’ll discuss the performance of the removeAll() method in the java.util.HashSet class.

2. HashSet.removeAll()

The removeAll method removes all the elements, that are contained in the collection:

Set<Integer> set = new HashSet<Integer>();
set.add(1);
set.add(2);
set.add(3);
set.add(4);

Collection<Integer> collection = new ArrayList<Integer>();
collection.add(1);
collection.add(3);

set.removeAll(collection);

Integer[] actualElements = new Integer[set.size()];
Integer[] expectedElements = new Integer[] { 2, 4 };
assertArrayEquals(expectedElements, set.toArray(actualElements));

As a result, elements 1 and 3 will be removed from the set.

3. Internal Implementation and Time Complexity

The removeAll() method determines which one is smaller – the set or the collection. This is done by invoking the size() method on the set and the collection.

If the collection has fewer elements than the set, then it iterates over the specified collection with the time complexity O(n). It also checks if the element is present in the set with the time complexity O(1). And if the element is present, it’s being removed from the set using the remove() method of the set, which again has a time complexity of O(1). So the overall time complexity is O(n).

If the set has fewer elements than the collection, then it iterates over this set using O(n). Then it checks if each element is present in the collection by invoking its contains() method. And if such an element is present, then the element is removed from the set. So this depends on the time complexity of the contains() method.

Now in this case, if the collection is an ArrayList, the time complexity of the contains() method is O(m). So overall time complexity to remove all elements present in the ArrayList from the set is O(n * m).

If the collection is again HashSet, the time complexity of the contains() method is O(1). So overall time complexity to remove all elements present in the HashSet from the set is O(n).

4. Performance

To see the performance difference between the above 3 cases, let’s write a simple JMH benchmark test.

For the first case, we’ll initialize the set and collection, where we have more elements in the set than the collection. In the second case, we’ll initialize the set and the collection, where we have more elements in the collection than the set. And in the third case, we’ll initialize 2 sets, where we’ll have 2nd set having more number of elements than the 1st one:

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5)
public class HashSetBenchmark {

    @State(Scope.Thread)
    public static class MyState {
        private Set employeeSet1 = new HashSet<>();
        private List employeeList1 = new ArrayList<>();
        private Set employeeSet2 = new HashSet<>();
        private List employeeList2 = new ArrayList<>();
        private Set<Employee> employeeSet3 = new HashSet<>();
        private Set<Employee> employeeSet4 = new HashSet<>();

        private long set1Size = 60000;
        private long list1Size = 50000;
        private long set2Size = 50000;
        private long list2Size = 60000;
        private long set3Size = 50000;
        private long set4Size = 60000;

        @Setup(Level.Trial)
        public void setUp() {
            // populating sets
        }
    }
}

After, we add our benchmark tests:

@Benchmark
public boolean given_SizeOfHashsetGreaterThanSizeOfCollection_whenRemoveAllFromHashSet_thenGoodPerformance(MyState state) {
    return state.employeeSet1.removeAll(state.employeeList1);
}

@Benchmark
public boolean given_SizeOfHashsetSmallerThanSizeOfCollection_whenRemoveAllFromHashSet_thenBadPerformance(MyState state) {
    return state.employeeSet2.removeAll(state.employeeList2);
}

@Benchmark
public boolean given_SizeOfHashsetSmallerThanSizeOfAnotherHashSet_whenRemoveAllFromHashSet_thenGoodPerformance(MyState state) {
    return state.employeeSet3.removeAll(state.employeeSet4);
}

And here are the results:

Benchmark                                              Mode  Cnt            Score            Error  Units
HashSetBenchmark.testHashSetSizeGreaterThanCollection  avgt   20      2700457.099 ±     475673.379  ns/op
HashSetBenchmark.testHashSetSmallerThanCollection      avgt   20  31522676649.950 ± 3556834894.168  ns/op
HashSetBenchmark.testHashSetSmallerThanOtherHashset    avgt   20      2672757.784 ±     224505.866  ns/op

We can see the HashSet.removeAll() performs pretty bad when the HashSet has fewer elements than the Collection, which is passed as an argument to the removeAll() method. But when the other collection is again HashSet, then the performance is good.

5. Conclusion

In this article, we saw the performance of removeAll() in HashSet. When the set has fewer elements than the collection, then the performance of removeAll() depends on the time complexity of the contains() method of the collection.

As usual, the complete code for this article is available over on GitHub.