1. Introduction
A sparse table is a data structure that can answer some range query problems, such as range minimum query problem, in time.
In this tutorial, we’ll show how to construct a sparse table and its applications on various range query problems.
2. Problem Description
Let’s start with a range minimum query (RMQ) problem. Given an array of numbers and a list of queries in the form of where , we want to answer each query with the minimum number in the array between positions and inclusive. That is, .
3. Brute Force Solutions
Firstly, we’ll explore two brute force solutions for the range minimum query problem. For the first one, we can answer a query in time with space. For the second one, we can answer a query in time but with some extra space.
3.1. Answer Each Query Without Additional Space
For each range minimum query, we can go through all the array elements between the location and and find the minimum value:
algorithm RangeMinimumQuery(A, n, RMQ):
// INPUT
// A = a number array with size n
// RMQ = a list of range queries
// OUTPUT
// The minimum value of each range query
answer <- initialize an empty array
for each (L, R) in RMQ:
minValue = A[L]
for i <- L to R:
minValue = min(minValue, A[i])
Append minValue to answer
return answer
Since we may visit elements for each query, the total time complexity of this algorithm is , where is the number of queries. The space complexity of this algorithm is .
3.2. Constant Time Query with Preprocessing
The time complexity of the above algorithm depends on the number of queries . If is big, i.e. in the scale of or higher, the running time will be slow, as we multiply an extra on the .
In the first algorithm, we need time to answer each query. In order to have a faster answer for each query, we can preprocess the input array by storing the query results of all possible ranges:
algorithm Preprocess(A, n):
// INPUT
// A = a number array with size n
// OUTPUT
// minTable = the minimum values of all possible ranges of A
minTable <- initialize an n × n array
for i <- 1 to n:
minTable[i, i] = A[i]
for i <- 1 to n:
for j <- i + 1 to n:
minTable[i, j] = min(minTable[i, j - 1], A[j])
return minTable
In this algorithm, we first construct answers for the ranges that contain a single element. Then, we use a dynamic programming approach to compute answers for wider ranges based on the recursion formula:
Once we have a table that contains answers for all possible ranges, we can just look up the table for each range minimum query:
algorithm RangeMinimumQuery(A, n, RMQ):
// INPUT
// A = a number array with size n
// RMQ = a list of range queries
// OUTPUT
// The minimum value of each range query
minTable <- Preprocess(A, n)
answer <- initialize an empty array
for each (L, R) in RMQ:
Append minTable[L, R] to answer
return answer
The preprocessing takes time. For each query, we can answer it in time. Therefore, the overall time complexity is . The space complexity is since we need extra spaces to store the preprocessing results.
With preprocessing, we can answer each query in time. Therefore, when we have a big number of queries, this algorithm is more efficient than the first brute force solution.
4. Sparse Table Solution
In the brute force preprocessing algorithm, we construct a table for all possible ranges in time so that we can answer a range minimum query in time. We can use a sparse table data structure to precompute range minimum queries more efficiently and still achieve the constant query time.
4.1. Sparse Table Idea
A power-of-two number is an integer number of the form , where is a non-negative integer. We can represent any positive integer uniquely in binary as a sum of power-of-two numbers. For example, .
The length of a range on the array is the total number of array elements between positions and inclusive, which is . A power-of-two range is a range whose length is a power-of-two number.
A sub-range of range is a range where . Since the length of a range is a positive integer number, we can also represent a range as a union of power-of-two sub-ranges. For example, . The full range, , has a length of . The three sub-ranges have lengths of , , and , respectively.
In general, any range can be divided by a set of sub-ranges whose lengths are power-of-two numbers.
The main idea of a sparse table is to first compute all answers to queries on the power-of-two ranges. When we answer a range query on , we can first split it into power-of-two sub-ranges. Then, we look up the precomputed answers and combine them into our final answer.
4.2. Sparse Table Recurrence Formula
For each start location (), there are at most number of power-of-two sub-ranges: , , , …, where is biggest number to make . Therefore, we can use an two dimensional array to represent the sparse table.
Each sparse table entry ( and ) stores the query result on the range . For example, stores the query result on the range .
When , we store the query results for the length 1 ranges:
When , we store the query results for length ranges. We can compute the value from previously computed smaller ranges:
For example, we store the answer to the range in the table entry . Its value comes from the answers to the ranges and . Their sparse table entries are and respectively.
4.3. Sparse Table Construction
Similar to the preprocessing of our brute force approach, we can use dynamic programming to construct the sparse table:
algorithm PreprocessSparse(A, n):
// INPUT
// A = a number array with size n
// OUTPUT
// The minimum values of all possible power-of-two ranges of A
sparse <- initialize an n x (log2(n) + 1) 2D array
for i <- 1 to n:
sparse[i, 0] = A[i]
for j <- 1 to log2(n):
for i <- 1 to n:
sparse[i, j] = min(sparse[i, j-1], sparse[i + 2^(j-1), j-1])
return sparse
In this algorithm, we first compute the query results for all length ranges. Then, we use a nested for loop to compute query results by increasing the range length. The higher range length query results are from the query results of lower range lengths.
During the computation, we need to compute the power-of-two values: . In some programming languages, like C++ and Java, we can achieve this with bit operations like . Otherwise, we can precompute all the power-of-two values up to and look up the value when we need it.
Both time and space complexity of the sparse table construction is .
4.4. Sparse Table for Range Minimum Queries
When we compute the minimum number of a range, it doesn’t matter if we process a value in the range once or twice. Therefore instead of splitting a range into multiple non-overlapping ranges, we can also split the range into only two overlapping ranges that have the same power-of-two length.
For example, we can split range into 3 non-overlapping ranges , , and . Then, the range minimum query on is the minimum of the three queries on the sub-ranges: .
Also, we can just split into 2 overlapping ranges and , with the same length of . For the minimum function, we can count the same number multiple times without affecting the final result: . Therefore, we can also have .
For a range minimum query on the range , we can first find its maximum power-of-two length . For example, the maximum power-of-two length of range is . Then, we look up the sparse table for the two ranges and . Finally, the answer to the range is the minimum of the above two range query results:
algorithm RangeMinimumQuerySparseTable(A, n, RMQ):
// INPUT
// A = a number array with size n
// RMQ = a list of range queries
// OUTPUT
// the minimum value of each range query
sparse <- PreprocessSparse(A, n)
answer <- an empty array
for (L, R) in RMQ:
j <- log2(R - L + 1)
Append (min(sparse[L, j], sparse[R - 2^j + 1, j])) to answer
return answer
To find the maximum power-of-two length of a range , we need to compute the logarithm value . We can precompute all the logarithm values for fast processing:
algorithm Logarithm(n):
// INPUT
// n = the array size
// OUTPUT
// All logarithm values from 1 to n
log <- an empty array
log[1] <- 0
for i <- 2 to n:
log[i] <- log[i / 2] + 1
return log
The sparse table construction takes time. For each query, we can answer it in time. Therefore, the overall time complexity is . The space complexity is for the sparse table.
5. Sparse Table Applications
We can also use a sparse table to solve other types of range queries. For example, we can use the same logic to find the maximum number of a range query in time.
In general, we can define a function that computes a result value on a list of numbers. For example, computes the minimum value for a range minimum query: .
We can use a sparse table to answer a range query based on the function in time if has the following 2 properties:
- is associative: . With this property, we can split the whole range into sub-ranges and apply on those sub-ranges.
- is overlapping friendly: . With this property, we can just use two overlapping sub-ranges to compute the result on the whole range.
For example, the greatest common divisor (GCD) function is associative as . Also, it is overlapping friendly as . Therefore, we can use the same logic to construct a sparse table with GCD values on power-of-two ranges and answer a range GCD query in time.
6. Sparse Table Limitations
If the query function is not overlapping friendly, we cannot answer the range query in time. For example, the sum function is associative but not overlapping friendly. We have to split the whole range into several non-overlapping sub-ranges to answer a range sum query, which takes time. However, we can use a more efficient prefix sum approach to answer the range sum queries.
Also, if any element in the array changes, we have to recompute the sparse table in time. Therefore, a sparse table is better for range queries on immutable arrays. That is, the array data can’t be changed between two queries.
7. Conclusion
In this article, we showed how to use a sparse table to solve the range minimum query problem. Also, we discussed the sparse table applications in a general form and the limitations of the sparse table data structure.