1. Overview
In this tutorial, we’re going to learn how to remove duplicate elements in Kotlin collections.
2. The distinct() Function
*In order to remove duplicate elements from any collection (or Iterable, to be more specific), we can use the distinct() extension function*:
val protocols = listOf("tcp", "http", "tcp", "udp", "udp")
val distinct = protocols.distinct()
assertThat(distinct).hasSize(3)
assertThat(distinct).containsExactlyInAnyOrder("tcp", "http", "udp")
As shown above, this function removes all the extra occurrences of the given strings and keeps only one of each in the resulting List.
2.1. Implementation
Under the hood, the distinct() extension function compares the elements using their equals() method. To prove this claim, we can take a look at the function implementation:
public fun <T> Iterable<T>.distinct(): List<T> {
return this.toMutableSet().toList()
}
As shown above, it converts the receiving Iterable to a Set. Therefore, the equals() method determines the equality of elements.
As of this writing, the underlying Set implementation is LinkedHashSet. This means that the equals() implementation should be compatible with the hashCode() implementation, as well. Otherwise, we’ll see unexpected results.
3. The distinctBy() Function
Sometimes, however, we may need to remove duplicate elements using custom criteria. For instance, let’s say we’re going to encapsulate URLs as a combination of values:
data class Url(val protocol: String, val host: String, val port: Int, val path: String)
Given this, we can’t simply use the distinct() function and remove duplicate hostnames.
However, there’s another extension function named distinctBy() that accepts its custom criteria through a lambda.
To better understand this, let’s consider a collection of URLs:
val urls = listOf(
Url("https", "baeldung", 443, "/authors"),
Url("https", "baeldung", 443, "/authors"),
Url("http", "baeldung", 80, "/authors"),
Url("https", "baeldung", 443, "/kotlin/distinct"),
Url("https", "google", 443, "/"),
Url("http", "google", 80, "/search"),
Url("tcp", "docker", 2376, "/"),
)
So now, in order to remove duplicate hostnames, we can use the distinctBy() like:
val uniqueHosts = urls.distinctBy { it.host }
assertThat(uniqueHosts).hasSize(3)
Here, we’re telling the distinctBy {} to use the hostnames (the it.host part) when comparing each Url instance. Obviously, we only have three distinct hosts in the above example: “baeldung”, “google”, and “docker”.
As another example, here, we’re removing duplicate full URLs:
val uniqueUrls = urls.distinctBy { "${it.protocol}://${it.host}:${it.port}/" }
assertThat(uniqueUrls).hasSize(5)
In the above example, two URLs are duplicate if they share the same protocol, hostname, and port values.
3.1. Implementation
Let’s look at the distinctBy() implementation:
public inline fun <T, K> Iterable<T>.distinctBy(selector: (T) -> K): List<T> {
val set = HashSet<K>()
val list = ArrayList<T>()
for (e in this) {
val key = selector(e)
if (set.add(key))
list.add(e)
}
return list
}
Basically, it iterates the receiving Iterable
4. Conclusion
In this tutorial, we learned two approaches to remove duplicate elements from a collection or array. The distinct() function is useful when we’re going to use object equality to compare elements. On the other hand, for a more customized comparison, we can use the more flexible distinctBy() function.
As usual, all the examples are available over on GitHub.