1. Introduction
Camel case is a format for strings commonly used for identifiers, in which:
- there are no intervening spaces or underscores
- all words except the first one start with a capital letter
- all the rest of the letters are in lowercase
So a valid camel case string would be thisIsACamelCaseString, but this one wouldn’t: this_stringIsNotIn_Camel_case.
In this tutorial, we’re going to explore two different approaches to the problem of converting an arbitrary String into this format.
2. Description of the Solution
In the input String:
- the words are separated by either spaces or underscores
- there may be more than one separator between any two contiguous words
Whichever algorithm we use, we need to make sure of these points:
- all the spaces and underscores are eliminated
- the first character of each word is in upper case
- the rest of the characters of each word are in lowercase
- all other characters are unchanged
We’ll use a value class to provide an extension method to the class String:
object StringWrapper {
/** A value class that adds the `toCamelCase` method to strings.
*
* @param spacedString
* a string with spaces
*/
implicit class CamelCaseWrapper(val spacedString: String) extends AnyVal {
/** Transforms a spaced string to a camelCase string.
*
* @return
* a string in camelCase format
*/
def toCamelCase: String = useMapReduce(spacedString)
}
// TODO actual implementations go here
}
2.1. Using Map-Reduce
A quick and elegant way to solve the problem is to use a pattern called map-reduce. In a nutshell, map-reduce consists of exploding the input, modifying the isolated pieces (this is called mapping), and then assembling the result with the processed pieces.
The actual implementation of the algorithm looks like this:
val useMapReduce: String => String = { spacedString =>
val first :: rest = spacedString.split(Array(' ', '_')).toList.map(_.toLowerCase)
val changedRest = rest.map(w => w.take(1).toUpperCase + w.drop(1))
val reunited = first :: changedRest
reunited.mkString
}
The first thing it does is split its input (called spacedString), using the split() method of String, and store the resulting words in a list. The words are then mapped to toLower(), and the resulting list is separated into a first word and the rest of the words. All this happens in the first line.
The second part is to change the first letter of each word (except the first one) to upper case. Then, we store this modified rest of the words into changedRest. Then, the original first word (all lowercase) and the rest of the words (with a capital first letter) are reunited in a new list called reunited.
Finally, the reduce part: we take all the words and combine them into a new String.
This approach is easy to write and understand and very powerful, but it has the problem that all words need to be stored in a memory structure (a list of strings in our case). For very, very long strings, that might become a problem.
2.2. Using a Sliding Window
Another approach that doesn’t require as much memory as map-reduce is to consider the input string as a stream of characters and transform them on the fly.
This is an interesting problem because:
- words might be separated by any number of separators
- both spaces and underscores are separators
- a decision about which case to use (either upper or lower case) for a given character is to be made depending on what the next character is rather than what characters have been seen so far
This is an implementation of the function, considering the points listed above:
val useStreams: String => String = { spacedString =>
val first = spacedString.take(1).toLowerCase
val rest =
spacedString.toStream.sliding(2).foldLeft("") { (str, charStream) =>
val added = charStream.toList match {
case ' ' :: ' ' :: _ => ""
case ' ' :: '_' :: _ => ""
case '_' :: ' ' :: _ => ""
case '_' :: '_' :: _ => ""
case ' ' :: other => other.mkString.toUpperCase
case '_' :: other => other.mkString.toUpperCase
case _ :: other :: _ if other != ' ' && other != '_' =>
other.toLower.toString
case _ => ""
}
str + added
}
first + rest
}
The first character is considered separately because it’s lowercase always.
With the rest of the characters, we’ll create a stream, and then we’ll apply a technique called sliding, using a window of size 2. That means that we’ll take all characters of the input, but instead of reading one character at a time, we’ll read two characters.
This way, we can look ahead to what the next character will be; in a certain sense, it’s like peeking into the future.
For each pair of characters, we’ll decide what to add to the resulting string:
- if this is a separator, and the next one too, add nothing
- if this is a separator, but the next one isn’t, add next (as upper case, because it’s the beginning of a new word)
- if this is not a separator, and neither is the next one, add the next one (as lowercase, for this means that we’re in the middle of a word)
- in any other case, add nothing
Finally, we concatenate the first character with the rest.
This approach illustrates the generic principle of streaming: treat the elements of the stream as they come without ever overloading the available resources. But this algorithm is more convoluted than the previous one, even ugly and error-prone.
3. Testing the Solution
Using ScalaTest, the tests can be as simple as:
"A camelCase converter" should {
"handle the empty string gracefully" in {
assertResult("")("".toCamelCase)
}
"remove spaces and change case" in {
val notCamelCase = "A string With DivErSe CASEs"
val camelCase = "aStringWithDiverseCases"
assertResult(camelCase)(notCamelCase.toCamelCase)
}
"remove underscores and spaces" in {
val notCamelCase = "I don't like_snakes_because They_BITE"
val camelCase = "iDon'tLikeSnakesBecauseTheyBite"
assertResult(camelCase)(notCamelCase.toCamelCase)
}
}
4. Conclusion
This small problem has given us two different approaches to solving it, each with its advantages and caveats. As is usually the case, we must consider both when choosing a path to follow.
As usual, the full code for this article is available over on GitHub.