1. Overview
In this tutorial, we’ll learn how to remove duplicate characters from a String in Scala using the standard library.
2. Iterating Through the String
The most naive approach would be to iterate through all characters of the String and check if we have seen it already:
scala> val s = "abcb"
s: String = abcb
scala> val sb = new StringBuilder()
sb: StringBuilder =
scala> s.foreach { case char =>
| if (!sb.toString.contains(char)) {
| sb.append(char)
| }
| }
scala> sb.toString
res5: String = abc
In this example, we use a StringBuilder to store the characters we have seen previously. We could use slightly different approaches while keeping the same idea. For instance, we can use the String.indexOf() method to discover if the character exists further ahead on the String, as we might do when removing duplicated characters in Java.
3. Using distinct()
Another possible approach that we’ll look at is the String.distinct() method:
scala> val s = "abcb"
s: String = abcb
scala> s.distinct
res0: String = abc
Using an existing method in the standard library requires much less effort than the previous one.
4. Using a Set
If we don’t care about the order of the characters, we can convert our String into a Set, which by default doesn’t contain duplicates:
scala> val s = "aabbccddeeff"
s: String = aabbccddeeff
scala> s.toSet
res0: scala.collection.immutable.Set[Char] = Set(e, f, a, b, c, d)
scala> s.toSet.mkString
res1: String = efabcd
But we can also keep the original order if we make use of a sorted Set like LinkedHashSet:
scala> import scala.collection.mutable.LinkedHashSet
import scala.collection.mutable.LinkedHashSet
scala> val sortedSet = LinkedHashSet[Char]()
sortedSet: scala.collection.mutable.LinkedHashSet[Char] = Set()
scala> (sortedSet ++= s.toList).mkString
res0: String = abcdef
This approach ensures we keep the original order while removing the duplicates.
5. Conclusion
In this article, we’ve learned how to easily remove the duplicated characters of a Scala String by using the standard library.
We discussed the naive approach by iterating through each character and looking for more occurrences. Then, we used the String.distinct() method, and finally, we saw how to use Set to achieve the same result.