1. Overview
In this tutorial, we’ll review several ways of checking if a String contains a substring, and we’ll compare the performance of each.
2. String.indexOf
Let’s first try using the String.indexOf method. indexOf gives us the first position where the substring is found, or -1 if it isn’t found at all.
When we search for “Rhap”, it will return 9:
Assert.assertEquals(9, "Bohemian Rhapsodyan".indexOf("Rhap"));
When we search for “rhap”, it’ll return -1 because it’s case sensitive.
Assert.assertEquals(-1, "Bohemian Rhapsodyan".indexOf("rhap"));
Assert.assertEquals(9, "Bohemian Rhapsodyan".toLowerCase().indexOf("rhap"));
It’s also important to note, that if we search the substring “an”, it’ll return 6 because it returns the first occurrence:
Assert.assertEquals(6, "Bohemian Rhapsodyan".indexOf("an"));
3. String.contains
Next, let’s try String.contains. contains will search a substring throughout the entire String and will return true if it’s found and false otherwise.
In this example, contains returns true because “Hey” is found.
Assert.assertTrue("Hey Ho, let's go".contains("Hey"));
If the string is not found, contains returns false:
Assert.assertFalse("Hey Ho, let's go".contains("jey"));
In the last example, “hey” is not found because String.contains is case-sensitive.
Assert.assertFalse("Hey Ho, let's go".contains("hey"));
Assert.assertTrue("Hey Ho, let's go".toLowerCase().contains("hey"));
An interesting point is that contains* internally calls *indexOf to know if a substring is contained, or not.
4. StringUtils.containsIgnoreCase
Our third approach will be using ***StringUtils#***containsIgnoreCase from the Apache Commons Lang library:
Assert.assertTrue(StringUtils.containsIgnoreCase("Runaway train", "train"));
Assert.assertTrue(StringUtils.containsIgnoreCase("Runaway train", "Train"));
We can see that it will check if a substring is contained in a String, ignoring the case. That’s why containsIgnoreCase returns true when we search for “Trai” and also “trai” inside of “Runaway Train”.
This approach won’t be as efficient as the previous approaches as it takes additional time to ignore the case. containsIgnoreCase internally converts every letter to upper-case and compares the converted letters instead of the original ones.
5. Using Pattern
Our last approach will be using a Pattern with a regular expression:
Pattern pattern = Pattern.compile("(?<!\\S)" + "road" + "(?!\\S)");
We can observe that we need to build the Pattern first, then we need to create the Matcher, and finally, we can check with the find method if there’s an occurrence of the substring or not:
Matcher matcher = pattern.matcher("Hit the road Jack");
Assert.assertTrue(matcher.find());
For example, the first time that find is executed, it returns true because the word “road” is contained inside of the string “Hit the road Jack”, but when we try to find the same word in the string “and don’t you come back no more” it returns false:
Matcher matcher = pattern.matcher("and don't you come back no more");
Assert.assertFalse(matcher.find());
6. Performance Comparison
We’ll use an open-source micro-benchmark framework called Java Microbenchmark Harness (JMH) in order to decide which method is the most efficient in terms of execution time.
6.1. Benchmark Setup
As in every JMH benchmark, we have the ability to write a setup method, in order to have certain things in place before our benchmarks are run:
@Setup
public void setup() {
message = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, " +
"sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. " +
"Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris " +
"nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in " +
"reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. " +
"Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt " +
"mollit anim id est laborum";
pattern = Pattern.compile("(?<!\\S)" + "eiusmod" + "(?!\\S)");
}
In the setup method, we’re initializing the message field. We’ll use this as the source text for our various searching implementations.
We also are initializing pattern in order to use it later in one of our benchmarks.
6.2. The String.indexOf Benchmark
Our first benchmark will use indexOf:
@Benchmark
public int indexOf() {
return message.indexOf("eiusmod");
}
We’ll search in which position “eiusmod” is present in the message variable.
6.3. The String.contains Benchmark
Our second benchmark will use contains:
@Benchmark
public boolean contains() {
return message.contains("eiusmod");
}
We’ll try to find if the message value contains “eiusmod”, the same substring used in the previous benchmark.
6.4. The StringUtils.containsIgnoreCase Benchmark
Our third benchmark will use *StringUtils#*containsIgnoreCase:
@Benchmark
public boolean containsStringUtilsIgnoreCase() {
return StringUtils.containsIgnoreCase(message, "eiusmod");
}
As with the previous benchmarks, we’ll search the substring in the message value.
6.5. The Pattern Benchmark
And our last benchmark will use Pattern:
@Benchmark
public boolean searchWithPattern() {
return pattern.matcher(message).find();
}
We’ll use the pattern initialized in the setup method to create a Matcher and be able to call the find method, using the same substring as before.
6.6. Analysis of Benchmarks Results
It’s important to note that we’re evaluating the benchmark results in nanoseconds.
After running our JMH test, we can see the average time each took:
- contains: 14.736 ns
- indexOf: 14.200 ns
- containsStringUtilsIgnoreCase: 385.632 ns
- searchWithPattern: 1014.633 ns
indexOf method is the most efficient one, closely followed by contains. It makes sense that contains took longer because is using indexOf internally.
containsStringUtilsIgnoreCase took extra time compared with the previous ones because it’s case insensitive.
searchWithPattern, took an even higher average time the last one, *proving that using Patterns is the worst alternative for this task.*
7. Conclusion
In this article, we’ve explored various ways to search for a substring in a String. We’ve also benchmarked the performance of the different solutions.
As always, the code is available over on GitHub.