1. Introduction

In data analysis and manipulation, extracting numerical information from text is a critical and fundamental task. It’s crucial for various tasks such as parsing identifiers, extracting phone numbers, interpreting ZIP codes, and more.

In this tutorial, we’ll look at different ways to extract numbers from a string in Kotlin.

2. Assumptions

For the problem at hand, we’ll focus solely on extracting positive numbers of base 10 that can be converted to a BigInteger data type. Decimal numbers and non-base 10 numbers are not considered within the scope of the extraction methods.

3. Using a Loop

We can use the traditional for loop to iterate over each character of the text and extract the numbers. Let’s look at the implementation:

fun extractNumbersUsingLoop(str: String): List<BigInteger> {
    val numbers = mutableListOf<BigInteger>()
    val currentNumber = StringBuilder()
    for (char in str) {
        if (char.isDigit()) {
            currentNumber.append(char)
        } else if (currentNumber.isNotEmpty()) {
            numbers.add(currentNumber.toString().toBigInteger())
            currentNumber.clear()
        }
    }
    if (currentNumber.isNotEmpty()) {
        numbers.add(currentNumber.toString().toBigInteger())
    }
    return numbers
}

In this approach, we iterate over each character in the text. We use a StringBuilder to efficiently construct numeric substrings during the iteration and extract the consecutive digits. These extracted digits are then appended to a list that we return from the method. Note that we use the toBigInteger() method to convert the string to a number to ensure that very large numbers are handled correctly.

4. Using a Regular Expression

Regular expressions are a powerful tool for finding specific patterns in text. We can use a regular expression to extract the numbers from a string without manually iterating through each character of the text:

fun extractMultipleUsingRegex(str: String): List<BigInteger> {
    return Regex("\\d+").findAll(str).map { it.value.toBigInteger() }.toList()
}

Here, we utilize the regular expression \\d to match numeric characters within the string. The findAll() function, when applied to the Regex object, identifies all instances of these numbers and generates a sequence of MatchResult objects. By iterating through this sequence and mapping the elements, we can convert our String to a list of BigInteger.

5. Using split()

Another way to extract the numbers is using the split() method with a regular expression. The regular expression approaches make the implementation more flexible and concise than the iterative approach.

Let’s look at the implementation:

fun extractNumbersUsingSplitAndRegex(str: String): List<BigInteger> {
    return str.split(Regex("\\D+"))
        .filter { it.isNotBlank() }
        .map { it.toBigInteger() }
}

In this case, we use the regular expression \\D+ to split the string. This regular expression \\D matches any character that is not a digit, whereas the regular expression \\d matches digits. The + specifies that we want to match one or more subsequent digits. After we split the string, we remove the empty results and convert the String into a BigInteger.

6. Testing the Implementations

Having implemented different approaches to extracting numbers from a string, let’s write unit tests to ensure the correctness of the implementations. We can utilize parameterized tests to cover various cases.

Let’s add the dependency to include the junit-jupiter-params library to write parameterized tests:

<dependency>
    <groupId>org.junit.jupiter</groupId>
    <artifactId>junit-jupiter-params</artifactId>
    <version>5.10.2</version>
    <scope>test</scope>
</dependency>

For brevity, we’ll showcase the test for just one implementation in this article:

@ParameterizedTest
@CsvSource(
    "string with 123 and 333 in the text, 123:333",
    "another string with 456 and 789, 456:789",
    "string 123-234, 123:234",
    "no numbers,",
    "3 4 50 numbers6, 3",
    "91234567891011121314151617181920number, 91234567891011121314151617181920",
    "123456789large numbers0, 123456789"
)
fun `extract all occurrences of numbers from string using regex`(str: String, expected: String?) {
    val numbers = extractMultipleUsingRegex(str)
    val expectedList = expected?.split(":")?.map { it.toBigInteger() } ?: emptyList()
    Assertions.assertEquals(expectedList, numbers)
}

Here, we define the different data for the test using the @CsvSource annotation and a custom separator : to structure the expected outcomes for each case. Within the test, we split this string and compare it against the result from the method. The test suite covers various scenarios, including strings containing very large numbers, mixtures of numbers and characters, special characters, and more.

Similarly, we can write the tests for other implementations as well.

Additionally, there are situations where defining test data in @CsvSource becomes cumbersome due to the way it handles the data. In such cases, opting for traditional unit tests provides a straightforward solution. For example, let’s write a simple unit test to check for the empty string scenario:

@Test
fun `check empty string scenario for split`() {
    val numbers = extractNumbersUsingSplitAndRegex("")
    Assertions.assertIterableEquals(numbers, listOf<BigInteger>())
}

7. Conclusion

In this article, we looked at different approaches to extracting numbers from a string in Kotlin, including regular expressions and for loops. Furthermore, we emphasized the significance of testing, utilizing both parameterized tests and individual test cases to validate the implementations thoroughly.

As always, the sample code used in this article is available over on GitHub.