1. Overview
In this article, we’ll look at different ways to search an array for a specified value.
We’ll also compare how these perform using JMH (the Java Microbenchmark Harness) to determine which method works best.
2. Setup
For our examples, we’ll use an array that contains randomly generated Strings for each test:
String[] seedArray(int length) {
String[] strings = new String[length];
Random value = new Random();
for (int i = 0; i < length; i++) {
strings[i] = String.valueOf(value.nextInt());
}
return strings;
}
To reuse the array in each benchmark, we’ll declare an inner class to hold the array and the count so we can declare its scope for JMH:
@State(Scope.Benchmark)
public static class SearchData {
static int count = 1000;
static String[] strings = seedArray(1000);
}
3. Basic Search
Three commonly used methods for searching an array are as a List, a Set, or with a loop that examines each member until it finds a match.
Let’s start with three methods that implement each algorithm:
boolean searchList(String[] strings, String searchString) {
return Arrays.asList(SearchData.strings)
.contains(searchString);
}
boolean searchSet(String[] strings, String searchString) {
Set<String> stringSet = new HashSet<>(Arrays.asList(SearchData.strings));
return stringSet.contains(searchString);
}
boolean searchLoop(String[] strings, String searchString) {
for (String string : SearchData.strings) {
if (string.equals(searchString))
return true;
}
return false;
}
We’ll use these class annotations to tell JMH to output average time in microseconds and run for five warmup iterations to ensure that our tests are reliable:
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 5)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
And run each test in a loop:
@Benchmark
public void searchArrayLoop() {
for (int i = 0; i < SearchData.count; i++) {
searchLoop(SearchData.strings, "T");
}
}
@Benchmark
public void searchArrayAllocNewList() {
for (int i = 0; i < SearchData.count; i++) {
searchList(SearchData.strings, "T");
}
}
@Benchmark
public void searchArrayAllocNewSet() {
for (int i = 0; i < SearchData.count; i++) {
searchSet(SearchData.strings, "S");
}
}
When we run with 1000 searches for each method, our results look something like this:
SearchArrayTest.searchArrayAllocNewList avgt 20 937.851 ± 14.226 us/op
SearchArrayTest.searchArrayAllocNewSet avgt 20 14309.122 ± 193.844 us/op
SearchArrayTest.searchArrayLoop avgt 20 758.060 ± 9.433 us/op
The loop search is more efficient than others. But this is at least partly because of how we’re using collections.
We’re creating a new List instance with each call to searchList() and a new List and a new HashSet with each call to searchSet(). Creating these objects creates an additional cost that looping through the array doesn’t.
4. More Efficient Search
What happens when we create single instances of List and Set and then reuse them for each search?
Let’s give it a try:
public void searchArrayReuseList() {
List asList = Arrays.asList(SearchData.strings);
for (int i = 0; i < SearchData.count; i++) {
asList.contains("T");
}
}
public void searchArrayReuseSet() {
Set asSet = new HashSet<>(Arrays.asList(SearchData.strings));
for (int i = 0; i < SearchData.count; i++) {
asSet.contains("T");
}
}
We’ll run these methods with the same JMH annotations as above, and include the results for the simple loop for comparison.
We see very different results:
SearchArrayTest.searchArrayLoop avgt 20 758.060 ± 9.433 us/op
SearchArrayTest.searchArrayReuseList avgt 20 837.265 ± 11.283 us/op
SearchArrayTest.searchArrayReuseSet avgt 20 14.030 ± 0.197 us/op
While searching the List is marginally faster than before, Set drops to less than 1 percent of the time required for the loop!
Now that we’ve removed the time required for creating new Collections from each search, these results make sense.
Searching a hash table, the structure underlying a HashSet, has a time complexity of 0(1), while an array, which underlies the ArrayList is 0(n).
5. Binary Search
Another method for searching an array is a binary search. While very efficient, a binary search requires that the array is sorted in advance.
Let’s sort the array and try the binary search:
@Benchmark
public void searchArrayBinarySearch() {
Arrays.sort(SearchData.strings);
for (int i = 0; i < SearchData.count; i++) {
Arrays.binarySearch(SearchData.strings, "T");
}
}
SearchArrayTest.searchArrayBinarySearch avgt 20 26.527 ± 0.376 us/op
Binary search is very fast, although less efficient than the HashSet: the worst case performance for a binary search is 0(log n), which places its performance between that of an array search and a hash table.
6. Conclusion
We’ve seen several methods of searching through an array.
Based on our results, a HashSet works best for searching through a list of values. However, we need to create them in advance and store them in the Set.
As always, the full source code of the examples is available over on GitHub.