1. Overview

Set is one of the commonly used collection types in Java. Today, we’ll discuss how to find the difference between two given sets.

2. Introduction to the Problem

Before we take a closer look at the implementations, we need first to understand the problem. As usual, an example may help us to understand the requirement quickly.

Let’s say we have two Set objects, set1 and set2:

set1: {"Kotlin", "Java", "Rust", "Python", "C++"}
set2: {"Kotlin", "Java", "Rust", "Ruby", "C#"}

As we can see, both sets contain some programming language names. The requirement “Finding the difference between two Sets” may have two variants:

  • Asymmetric difference – Finding those elements that are contained by set1 but not contained by set2; in this case, the expected result is {“Python”, “C++”}
  • Symmetric difference – Finding the elements in either of the sets but not in their intersection; if we look at our example, the result should be {“Python”, “C++”, “Ruby”, “C#”}

In this tutorial, we’ll address the solution to both scenarios. First, we’ll focus on finding the asymmetric differences. After that, we’ll explore finding the symmetric difference between the two sets.

Next, let’s see them in action.

3. Asymmetric Difference

3.1. Using the Standard removeAll Method

The Set class has provided a removeAll method. This method implements the removeAll method from the Collection interface.

The removeAll method accepts a Collection object as the parameter and removes all elements in the parameter from the given Set object. So, if we pass the set2 object as the parameter in this way, “*set1.removeAll(set2)*“, the rest of the elements in the set1 object will be the result.

For simplicity, let’s show it as a unit test:

Set<String> set1 = Stream.of("Kotlin", "Java", "Rust", "Python", "C++").collect(Collectors.toSet());
Set<String> set2 = Stream.of("Kotlin", "Java", "Rust", "Ruby", "C#").collect(Collectors.toSet());
Set<String> expectedOnlyInSet1 = Set.of("Python", "C++");

set1.removeAll(set2);

assertThat(set1).isEqualTo(expectedOnlyInSet1);

As the method above shows, first, we initialize the two Set objects using Stream. Then, after calling the removeAll method, the set1 object contains the expected elements.

This approach is pretty straightforward. However, the drawback is obvious: After removing the common elements from set1the original set1 is modified.

Therefore, we need to backup the original set1 object if we still need it after calling the removeAll method, or we have to create a new mutable set object if the set1 is an immutable Set.

Next, let’s take a look at another approach to returning the asymmetric difference in a new Set object without modifying the original set.

3.2. Using the Stream.filter Method

The Stream API has been around since Java 8. It allows us to filter elements from a collection using the Stream.filter method.

We can also solve this problem using Stream.filter without modifying the original set1 object. Let’s first initialize the two sets as immutable sets:

Set<String> immutableSet1 = Set.of("Kotlin", "Java", "Rust", "Python", "C++");
Set<String> immutableSet2 = Set.of("Kotlin", "Java", "Rust", "Ruby", "C#");
Set<String> expectedOnlyInSet1 = Set.of("Python", "C++");

Since Java 9, the Set interface introduced the static of method. It allows us to initialize an immutable Set object conveniently. That is to say, if we attempt to modify immutableSet1, an UnsupportedOperationException will be thrown.

Next, let’s write a unit test that uses Stream.filter to find the difference:

Set<String> actualOnlyInSet1 = immutableSet1.stream().filter(e -> !immutableSet2.contains(e)).collect(Collectors.toSet());
assertThat(actualOnlyInSet1).isEqualTo(expectedOnlyInSet1);

As we can see in the method above, the key is “*filter(e -> !immutableSet2.contains(e))*“. Here, we only take the elements that are in immutableSet1 but not in immutableSet2.

If we execute this test method, it passes without any exception. It means this approach works, and the original sets are not modified.

3.3. Using the Guava Library

Guava is a popular Java library that ships with some new collection types and convenient helper methods. Guava has provided a method to find the asymmetric differences between two sets. Therefore, we can use this method to solve our problems easily.

But first, we need to include the library in our classpath. Let’s say we manage the project dependencies by Maven. We may need to add the Guava dependency to the pom.xml:

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>32.1.3-jre</version>
</dependency>

Once Guava is available in our Java project, we can use its Sets.difference method to get the expected result:

Set<String> actualOnlyInSet1 = Sets.difference(immutableSet1, immutableSet2);
assertThat(actualOnlyInSet1).isEqualTo(expectedOnlyInSet1);

It’s worth mentioning that the Sets.difference method returns an immutable Set view containing the result. It means:

  • We cannot modify the returned set
  • If the original set is a mutable one, changes to the original set may be reflected in our resulting set view

3.4. Using the Apache Commons Library

Apache Commons is another widely used library. The Apache Commons Collections4 library provides many nice collection-related methods as complementary to the standard Collection API.

Before we start using it, let’s add the dependency to our pom.xml:

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-collections4</artifactId>
    <version>4.4</version>
</dependency>

Similarly, we can find the latest version at Maven’s central repository.

The commons-collections4 library has a CollectionUtils.removeAll method. It’s similar to the standard Collection.removeAll method but returns the result in a new Collection object instead of modifying the first Collection object.

Next, let’s test it with two immutable Set objects:

Set<String> actualOnlyInSet1 = new HashSet<>(CollectionUtils.removeAll(immutableSet1, immutableSet2));
assertThat(actualOnlyInSet1).isEqualTo(expectedOnlyInSet1);

The test will pass if we execute it. But, we should note that the CollectionUtils.removeAll method returns the result in the Collection type.

If a concrete type is required – for instance, Set in our case – we’ll need to convert it manually. In the test method above, we’ve initialized a new HashSet object using the returned collection.

4. Symmetric Difference

So far, we’ve learned how to get the asymmetric difference between two sets. Now, let’s take a closer look at the other scenario: finding the symmetric difference between two sets.

We’ll address two approaches to get the symmetric difference from our two immutable set examples.

The expected result is:

Set<String> expectedDiff = Set.of("Python", "C++", "Ruby", "C#");

Next, let’s see how to solve the problem.

4.1. Using HashMap

One idea to solve the problem is first creating a Map<T, Integer> object.

Then, we iterate through the two given sets and put each element to the map as the key. If the key exists in the map, it means this is a common element in both sets. We set a special number as the value – for example, Integer.MAX_VALUE. Otherwise, we put the element and the value 1 as a new entry in the map.

Finally, we find out the keys whose value is 1 in the map, and these keys are the symmetric difference between two given sets.

Next, let’s implement the idea in Java:

public static <T> Set<T> findSymmetricDiff(Set<T> set1, Set<T> set2) {
    Map<T, Integer> map = new HashMap<>();
    set1.forEach(e -> putKey(map, e));
    set2.forEach(e -> putKey(map, e));
    return map.entrySet().stream()
      .filter(e -> e.getValue() == 1)
      .map(Map.Entry::getKey)
      .collect(Collectors.toSet());
}

private static <T> void putKey(Map<T, Integer> map, T key) {
    if (map.containsKey(key)) {
        map.replace(key, Integer.MAX_VALUE);
    } else {
        map.put(key, 1);
    }
}

Now, let’s test our solution and see if it can give the expected result:

Set<String> actualDiff = SetDiff.findSymmetricDiff(immutableSet1, immutableSet2);
assertThat(actualDiff).isEqualTo(expectedDiff);

The test passes if we run it. That is to say, our implementation works as expected.

4.2. Using the Apache Commons Library

We’ve already introduced the Apache Commons library when finding the asymmetric difference between two sets. Actually, the commons-collections4 library has a handy SetUtils.disjunction method to return the symmetric difference between two sets directly:

Set<String> actualDiff = SetUtils.disjunction(immutableSet1, immutableSet2);
assertThat(actualDiff).isEqualTo(expectedDiff);

As the method above shows, unlike the CollectionUtils.removeAll method, the SetUtils.disjunction method returns a Set object. We don’t need to manually convert it to Set.

5. Conclusion

In this article, we’ve explored how to find differences between two Set objects through examples. Further, we’ve discussed two variants of this problem: finding asymmetric differences and symmetric differences.

We’ve addressed solving the two variants using the standard Java API and widely used external libraries, such as Apache Commons-Collections and Guava.

As always, the source code used in this tutorial is available over on GitHub.