1. Overview

In this tutorial, we’re going to learn how to remove duplicate elements in Kotlin collections.

2. The distinct() Function

*In order to remove duplicate elements from any collection (or Iterable, to be more specific), we can use the distinct() extension function*:

val protocols = listOf("tcp", "http", "tcp", "udp", "udp")
val distinct = protocols.distinct()
assertThat(distinct).hasSize(3)
assertThat(distinct).containsExactlyInAnyOrder("tcp", "http", "udp")

As shown above, this function removes all the extra occurrences of the given strings and keeps only one of each in the resulting List.

2.1. Implementation

Under the hood, the distinct() extension function compares the elements using their equals() method. To prove this claim, we can take a look at the function implementation:

public fun <T> Iterable<T>.distinct(): List<T> {
    return this.toMutableSet().toList()
}

As shown above, it converts the receiving Iterable to a Set. Therefore, the equals() method determines the equality of elements.

As of this writing, the underlying Set implementation is LinkedHashSet. This means that the equals() implementation should be compatible with the hashCode() implementation, as well. Otherwise, we’ll see unexpected results.

3. The distinctBy() Function

Sometimes, however, we may need to remove duplicate elements using custom criteria. For instance, let’s say we’re going to encapsulate URLs as a combination of values:

data class Url(val protocol: String, val host: String, val port: Int, val path: String)

Given this, we can’t simply use the distinct() function and remove duplicate hostnames.

However, there’s another extension function named distinctBy() that accepts its custom criteria through a lambda.

To better understand this, let’s consider a collection of URLs:

val urls = listOf(
  Url("https", "baeldung", 443, "/authors"),
  Url("https", "baeldung", 443, "/authors"),
  Url("http", "baeldung", 80, "/authors"),
  Url("https", "baeldung", 443, "/kotlin/distinct"),
  Url("https", "google", 443, "/"),
  Url("http", "google", 80, "/search"),
  Url("tcp", "docker", 2376, "/"),
)

So now, in order to remove duplicate hostnames, we can use the distinctBy() like:

val uniqueHosts = urls.distinctBy { it.host }
assertThat(uniqueHosts).hasSize(3)

Here, we’re telling the distinctBy {} to use the hostnames (the it.host part) when comparing each Url instance. Obviously, we only have three distinct hosts in the above example: “baeldung”, “google”, and “docker”.

As another example, here, we’re removing duplicate full URLs:

val uniqueUrls = urls.distinctBy { "${it.protocol}://${it.host}:${it.port}/" }
assertThat(uniqueUrls).hasSize(5)

In the above example, two URLs are duplicate if they share the same protocol, hostname, and port values.

3.1. Implementation

Let’s look at the distinctBy() implementation:

public inline fun <T, K> Iterable<T>.distinctBy(selector: (T) -> K): List<T> {
    val set = HashSet<K>()
    val list = ArrayList<T>()
    for (e in this) {
        val key = selector(e)
        if (set.add(key))
            list.add(e)
    }
    return list
}

Basically, it iterates the receiving Iterable once. For each element, it calculates a key using the given lambda selector. If this key is a duplicate, then it won’t add the current element to the final List. In this tutorial, we only used String keys. However, it’s also possible to return any other types in the lambda.

4. Conclusion

In this tutorial, we learned two approaches to remove duplicate elements from a collection or array. The distinct() function is useful when we’re going to use object equality to compare elements. On the other hand, for a more customized comparison, we can use the more flexible distinctBy() function.

As usual, all the examples are available over on GitHub.


« 上一篇: Kotlin中复制列表