1. Overview
In this article, we’ll cover advantages of a binary search over a simple linear search and walk through its implementation in Java.
2. Need for Efficient Search
Let’s say we’re in the wine-selling business and millions of buyers are visiting our application every day.
Through our app, a customer can filter out items which have a price below n dollars, select a bottle from the search results, and add them to his cart. We have millions of users seeking wines with a price limit every second. The results need to be fast.
On the backend, our algorithm runs a linear search through the entire list of wines comparing the price limit entered by the customer with the price of every wine bottle in the list.
Then, it returns items which have a price less than or equal to the price limit. This linear search has a time complexity of O(n).
This means the bigger the number of wine bottles in our system, the more time it will take. The search time increases proportionately to the number of new items introduced.
If we start saving items in sorted order and search for items using the binary search, we can achieve a complexity of O(log n).
With binary search, the time taken by the search results naturally increases with the size of the dataset, but not proportionately.
3. Binary Search
Simply put, the algorithm compares the key value with the middle element of the array; if they are unequal, the half in which the key cannot be part of is eliminated, and the search continues for the remaining half until it succeeds.
Remember – the key aspect here is that the array is already sorted.
If the search ends with the remaining half being empty, the key is not in the array.
3.1. Iterative Implementation
public int runBinarySearchIteratively(
int[] sortedArray, int key, int low, int high) {
int index = Integer.MAX_VALUE;
while (low <= high) {
int mid = low + ((high - low) / 2);
if (sortedArray[mid] < key) {
low = mid + 1;
} else if (sortedArray[mid] > key) {
high = mid - 1;
} else if (sortedArray[mid] == key) {
index = mid;
break;
}
}
return index;
}
The runBinarySearchIteratively method takes a sortedArray, key & the low & high indexes of the sortedArray as arguments. When the method runs for the first time the low, the first index of the sortedArray, is 0, while the high, the last index of the sortedArray, is equal to its length – 1.
The middle is the middle index of the sortedArray. Now the algorithm runs a while loop comparing the key with the array value of the middle index of the sortedArray.
*Notice how the middle index is generated (int mid = low + ((high – low) / 2). This to accommodate for extremely large arrays.* If the middle index is generated simply by getting the middle index (int mid = (low + high) / 2), an overflow may occur for an array containing 230 or more elements as the sum of low + high could easily exceed the maximum positive int value.
3.2. Recursive Implementation
Now, let’s have a look at a simple, recursive implementation as well:
public int runBinarySearchRecursively(
int[] sortedArray, int key, int low, int high) {
int middle = low + ((high - low) / 2);
if (high < low) {
return -1;
}
if (key == sortedArray[middle]) {
return middle;
} else if (key < sortedArray[middle]) {
return runBinarySearchRecursively(
sortedArray, key, low, middle - 1);
} else {
return runBinarySearchRecursively(
sortedArray, key, middle + 1, high);
}
}
The runBinarySearchRecursively method accepts a sortedArray, key, the low and high indexes of the sortedArray.
**3.3. Using Arrays.binarySearch()
int index = Arrays.binarySearch(sortedArray, key);
A sortedArray and an int key, which is to be searched in the array of integers, are passed as arguments to the binarySearch method of the Java Arrays class.
**3.4. Using Collections.binarySearch()
int index = Collections.binarySearch(sortedList, key);
A sortedList & an Integer key, which is to be searched in the list of Integer objects, are passed as arguments to the binarySearch method of the Java Collections class.
3.5. With Sorted Array And Duplicates
If the array is a sorted array containing duplicates, and we use the binary search approaches discussed in the earlier sections, it’ll return any occurrence of the Integer key. But if we need to find the first occurrence and last occurrence of the key, we need to modify the traditional binary search logic. Finding the first and last occurrence of the key element gives us the entire window of occurrences of the key element.
To find the first occurrence, we first perform a binary search for the key element and then continue to search in the left half:
int startIndexSearch(int[] sortedArray, int target) {
int left = 0;
int right = sortedArray.length - 1;
int result = -1;
while (left <= right) {
int mid = left + (right - left) / 2;
if (sortedArray[mid] == target) {
result = mid;
right = mid - 1; // continue search on left half
} else if (sortedArray[mid] < target) {
left = mid + 1;
} else {
right = mid - 1;
}
}
return result;
}
Similarly, to find the last occurrence of the key element, we can modify the traditional binary search algorithm to continue searching in the right half even after finding the key element:
int endIndexSearch(int[] sortedArray, int target) {
int left = 0;
int right = sortedArray.length - 1;
int result = -1;
while (left <= right) {
int mid = left + (right - left) / 2;
if (sortedArray[mid] == target) {
result = mid;
left = mid + 1; // continue search in the right half
} else if (sortedArray[mid] < target) {
left = mid + 1;
} else {
right = mid - 1;
}
}
return result;
}
With the first and last occurrences, we can get all the key element’s indices in the sortedArray:
List<Integer> runBinarySearchOnSortedArraysWithDuplicates(int[] sortedArray, Integer key) {
int startIndex = startIndexSearch(sortedArray, key);
int endIndex = endIndexSearch(sortedArray, key);
return IntStream.rangeClosed(startIndex, endIndex)
.boxed()
.collect(Collectors.toList());
}
In the above example, we iterate from startIndex (first occurrence) to endIndex (last occurrence) of the key element, then collect the results into a list.
3.6. Performance
Whether to use a recursive or an iterative approach for writing the algorithm is mostly a matter of personal preference. But still here are a few points we should be aware of:
1. Recursion can be slower due to the overhead of maintaining a stack and usually takes up more memory
2. Recursion is not *stack-*friendly. It may cause StackOverflowException when processing big data sets
3. Recursion adds clarity to the code as it makes it shorter in comparison to the iterative approach
Ideally, a binary search will perform less number of comparisons in contrast to a linear search for large values of n. For smaller values of n, the linear search could perform better than a binary search.
One should know that this analysis is theoretical and might vary depending on the context.
Also, the binary search algorithm needs a sorted data set which has its costs too. If we use a merge sort algorithm for sorting the data, an additional complexity of n log n is added to our code.
So first we need to analyze our requirements well and then take a decision on which search algorithm would suit our requirements best.
4. Conclusion
This tutorial demonstrated a binary search algorithm implementation and a scenario where it would be preferable to use it instead of a linear search.
Please find the code for the tutorial over on GitHub.