1. Introduction
The Kotlin standard library offers a lot of functionality. It contains not only functional primitives for collections but also an extensive set of string utilities and much more.
The best way to get the feel of the standard library is to take a practical task and show how we can solve it in different ways using Kotlin SDK.
2. A Straightforward Solution
Any short string that might need its words capitalized consists of words that are sequences of letters delimited by whitespaces. Therefore, we can use String.split() function to produce an array of words out of input string and iterate through it to capitalize every word.
Then we can put the string back together:
input
.split(' ')
.joinToString(" ") { it.replaceFirstChar(Char::uppercaseChar) }
The joinToString() function combines by producing the string out of a collection and transforming that collection with a lambda.
The only downside of this solution is that it requires three times the memory required to store the initial string. Granted, in the absolute majority of real cases, this won’t be a problem.
However, if we, for the sake of the experiment, assume that the string might be of arbitrary length, then we might require a more economical solution.
3. A More Memory-Efficient Solution
Instead of splitting the whole string into an array, we can bite from it word by word, capitalize every word, and join it to the resulting string. A sequence {} builder will help us to separate concerns of various transformations we do to the string:
sequence {
var startIndex = 0
while (startIndex < input.length) {
val endIndex = input.indexOf(' ', startIndex).takeIf { it > 0 } ?: input.length
yield(input.substring(startIndex, endIndex))
startIndex = endIndex + 1
}
}.joinToString(" ") { it.replaceFirstChar(Char::uppercaseChar) }
Similarly, the extension function joinToString() is defined for Sequences too. This code will require only twice the amount of memory needed to store the input string. If we need further savings, we need to consider streaming IO.
Unfortunately, the input isn’t always clean and valid. What if some of the whitespaces in our string are doubled?
4. Support Multiple Whitespaces
The naïve approach will change very little. Instead of splitting by a character, we can split by a regular expression. While splitting by a character function is a part of the Java Development Kit, splitting by a regular expression is a variation exclusive to Kotlin:
input
.split("\\W+".toRegex())
.joinToString(" ") { it.replaceFirstChar(Char::uppercaseChar) }
But for the solution using sequence, we need to do some extra work. First, let’s create a function that would search through the string from a specified position character by character until a condition is met:
fun String.findFirstSince(position: Int, test: (Char) -> Boolean): Int {
for (i in position until length) {
if (test(this[i])) return i
}
return length
}
Then we can search through the input string for a start and an end of each word and thus yield it in sequence:
sequence {
var startIndex = 0
while (startIndex < input.length) {
val endIndex = input.findFirstSince(startIndex) { it == ' ' }
yield(input.substring(startIndex, endIndex))
startIndex = input.findFirstSince(endIndex) { it != ' ' }
}
}
The rest of the code will be the same as in the previous section.
5. Capitalize Like the News Outlets
In print, most publishers do not capitalize small words in a middle of a sentence. To support this business logic, we need a slightly more complicated code. We will start with creating of dictionary of words that we won’t capitalize if they aren’t the first or last words in a sentence:
val NON_CAPITALIZED_WORDS = setOf(
"as", "at", "but", "by", "for", // and so on
)
Then we will split the string as usual with the split() function:
val components = input.split("\\W+".toRegex())
Finally, we will buildString {} with another primitive from Kotlin standard library:
buildString {
components.forEachIndexed { index, word ->
when (index) {
in 1..components.size - 2 -> word.capitalizeMiddleWord() // Some short auxiliary words aren't capitalized
else -> word.replaceFirstChar(Char::uppercaseChar) // The first and the last words are always capitalized
}.let { append(it).append(' ') }
}
deleteCharAt(length - 1) // Drop the last whitespace
}
The capitalizeMiddleWord() function contains most of the complexity of our solution:
private fun String.capitalizeMiddleWord(): String =
if (length > 3 || this !in NON_CAPITALIZED_WORDS) replaceFirstChar(Char::uppercaseChar) else this
Let’s note that we have put the length check first, which is cheaper than checking if the set contains a value.
6. Conclusion
In this tutorial, we tried to offer various solutions for capitalizing every word in the string. We may need a different approach depending on the conditions of the task: if the input string does not take a significant part of application memory, we can just split() it. Otherwise, more complex solutions are available. We can also support more business rules if we must.
As always, all of the code examples are available over on GitHub.