将字符串转换为 Scala 中的驼峰命名法

1. Introduction

Camel case is a format for strings commonly used for identifiers, in which:

there are no intervening spaces or underscores
all words except the first one start with a capital letter
all the rest of the letters are in lowercase

So a valid camel case string would be thisIsACamelCaseString, but this one wouldn’t: this_stringIsNotIn_Camel_case.

In this tutorial, we’re going to explore two different approaches to the problem of converting an arbitrary String into this format.

2. Description of the Solution

In the input String:

the words are separated by either spaces or underscores
there may be more than one separator between any two contiguous words

Whichever algorithm we use, we need to make sure of these points:

all the spaces and underscores are eliminated
the first character of each word is in upper case
the rest of the characters of each word are in lowercase
all other characters are unchanged

We’ll use a value class to provide an extension method to the class String:

object StringWrapper {

  /** A value class that adds the `toCamelCase` method to strings.
    *
    * @param spacedString
    *   a string with spaces
    */
  implicit class CamelCaseWrapper(val spacedString: String) extends AnyVal {

    /** Transforms a spaced string to a camelCase string.
      *
      * @return
      *   a string in camelCase format
      */
    def toCamelCase: String = useMapReduce(spacedString)
  }

  // TODO actual implementations go here
}

2.1. Using Map-Reduce

A quick and elegant way to solve the problem is to use a pattern called map-reduce. In a nutshell, map-reduce consists of exploding the input, modifying the isolated pieces (this is called mapping), and then assembling the result with the processed pieces.

The actual implementation of the algorithm looks like this:

val useMapReduce: String => String = { spacedString =>
  val first :: rest = spacedString.split(Array(' ', '_')).toList.map(_.toLowerCase)
  val changedRest = rest.map(w => w.take(1).toUpperCase + w.drop(1))
  val reunited = first :: changedRest
  reunited.mkString
}

The first thing it does is split its input (called spacedString), using the split() method of String, and store the resulting words in a list. The words are then mapped to toLower(), and the resulting list is separated into a first word and the rest of the words. All this happens in the first line.

The second part is to change the first letter of each word (except the first one) to upper case. Then, we store this modified rest of the words into changedRest. Then, the original first word (all lowercase) and the rest of the words (with a capital first letter) are reunited in a new list called reunited.

Finally, the reduce part: we take all the words and combine them into a new String.

This approach is easy to write and understand and very powerful, but it has the problem that all words need to be stored in a memory structure (a list of strings in our case). For very, very long strings, that might become a problem.

2.2. Using a Sliding Window

Another approach that doesn’t require as much memory as map-reduce is to consider the input string as a stream of characters and transform them on the fly.

This is an interesting problem because:

words might be separated by any number of separators
both spaces and underscores are separators
a decision about which case to use (either upper or lower case) for a given character is to be made depending on what the next character is rather than what characters have been seen so far

This is an implementation of the function, considering the points listed above:

val useStreams: String => String = { spacedString =>
  val first = spacedString.take(1).toLowerCase
  val rest =
    spacedString.toStream.sliding(2).foldLeft("") { (str, charStream) =>
      val added = charStream.toList match {
        case ' ' :: ' ' :: _ => ""
        case ' ' :: '_' :: _ => ""
        case '_' :: ' ' :: _ => ""
        case '_' :: '_' :: _ => ""
        case ' ' :: other    => other.mkString.toUpperCase
        case '_' :: other    => other.mkString.toUpperCase
        case _ :: other :: _ if other != ' ' && other != '_' =>
          other.toLower.toString
        case _ => ""
      }
      str + added
    }
  first + rest
}

The first character is considered separately because it’s lowercase always.

With the rest of the characters, we’ll create a stream, and then we’ll apply a technique called sliding, using a window of size 2. That means that we’ll take all characters of the input, but instead of reading one character at a time, we’ll read two characters.

This way, we can look ahead to what the next character will be; in a certain sense, it’s like peeking into the future.

For each pair of characters, we’ll decide what to add to the resulting string:

if this is a separator, and the next one too, add nothing
if this is a separator, but the next one isn’t, add next (as upper case, because it’s the beginning of a new word)
if this is not a separator, and neither is the next one, add the next one (as lowercase, for this means that we’re in the middle of a word)
in any other case, add nothing

Finally, we concatenate the first character with the rest.

This approach illustrates the generic principle of streaming: treat the elements of the stream as they come without ever overloading the available resources. But this algorithm is more convoluted than the previous one, even ugly and error-prone.

3. Testing the Solution

Using ScalaTest, the tests can be as simple as:

"A camelCase converter" should {
  "handle the empty string gracefully" in {
    assertResult("")("".toCamelCase)
  }
  "remove spaces and change case" in {
    val notCamelCase = "A string With DivErSe CASEs"
    val camelCase = "aStringWithDiverseCases"
    assertResult(camelCase)(notCamelCase.toCamelCase)
  }
  "remove underscores and spaces" in {
    val notCamelCase = "I don't like_snakes_because   They_BITE"
    val camelCase = "iDon'tLikeSnakesBecauseTheyBite"
    assertResult(camelCase)(notCamelCase.toCamelCase)
  }
}

4. Conclusion

This small problem has given us two different approaches to solving it, each with its advantages and caveats. As is usually the case, we must consider both when choosing a path to follow.

As usual, the full code for this article is available over on GitHub.

Persistence

REST