1. Introduction

A sparse table is a data structure that can answer some range query problems, such as range minimum query problem, in O(1) time.

In this tutorial, we’ll show how to construct a sparse table and its applications on various range query problems.

2. Problem Description

Let’s start with a range minimum query (RMQ) problem. Given an array of numbers A = \{A_1, A_2, ..., A_n\} and a list of queries in the form of RMQ(L, R) where 1\le L \le R \le n, we want to answer each query with the minimum number in the array A between positions L and R inclusive. That is, RMQ(L, R) = \min (A_L, ..., A_R).

3. Brute Force Solutions

Firstly, we’ll explore two brute force solutions for the range minimum query problem. For the first one, we can answer a query in O(n) time with O(1) space. For the second one, we can answer a query in O(1) time but with some extra space.

3.1. Answer Each Query Without Additional Space

For each range minimum query, we can go through all the array elements between the location L and R and find the minimum value:

algorithm RangeMinimumQuery(A, n, RMQ):
    // INPUT
    //    A = a number array with size n
    //    RMQ = a list of range queries
    // OUTPUT
    //    The minimum value of each range query
    
    answer <- initialize an empty array
    
    for each (L, R) in RMQ:
        minValue = A[L]
        
        for i <- L to R:
            minValue = min(minValue, A[i])
        
        Append minValue to answer
    
    return answer

Since we may visit O(n) elements for each query, the total time complexity of this algorithm is O(mn), where m is the number of queries. The space complexity of this algorithm is O(1).

3.2. Constant Time Query with Preprocessing

The time complexity of the above algorithm depends on the number of queries m. If m is big, i.e. in the scale of O(n^2) or higher, the running time will be slow, as we multiply an extra O(n) on the m.

In the first algorithm, we need O(n) time to answer each query. In order to have a faster answer for each query, we can preprocess the input array A by storing the query results of all possible ranges:

algorithm Preprocess(A, n):
    // INPUT
    //    A = a number array with size n
    // OUTPUT
    //    minTable = the minimum values of all possible ranges of A

    minTable <- initialize an n × n array

    for i <- 1 to n:
        minTable[i, i] = A[i]

    for i <- 1 to n:
        for j <- i + 1 to n:
            minTable[i, j] = min(minTable[i, j - 1], A[j])

    return minTable

In this algorithm, we first construct answers for the ranges that contain a single element. Then, we use a dynamic programming approach to compute answers for wider ranges based on the recursion formula:

Rendered by QuickLaTeX.com

Once we have a table that contains answers for all possible ranges, we can just look up the table for each range minimum query:

algorithm RangeMinimumQuery(A, n, RMQ):
    // INPUT
    //    A = a number array with size n
    //    RMQ = a list of range queries
    // OUTPUT
    //    The minimum value of each range query

    minTable <- Preprocess(A, n)
    answer <- initialize an empty array

    for each (L, R) in RMQ:
        Append minTable[L, R] to answer

    return answer

The preprocessing takes O(n^2) time. For each query, we can answer it in O(1) time. Therefore, the overall time complexity is O(n^2+m). The space complexity is O(n^2) since we need extra spaces to store the preprocessing results.

With preprocessing, we can answer each query in O(1) time. Therefore, when we have a big number of queries, this algorithm is more efficient than the first brute force solution.

4. Sparse Table Solution

In the brute force preprocessing algorithm, we construct a table for all possible ranges [L, R] in O(n^2) time so that we can answer a range minimum query in O(1) time. We can use a sparse table data structure to precompute range minimum queries more efficiently and still achieve the constant query time.

4.1. Sparse Table Idea

A power-of-two number is an integer number of the form 2^n, where n is a non-negative integer. We can represent any positive integer uniquely in binary as a sum of power-of-two numbers. For example, 11=(1011)_2=8+2+1=2^3+2^1+2^0.

The length of a range [L, R] on the array A is the total number of array elements between positions L and R inclusive, which is R-L+1. A power-of-two range is a range whose length is a power-of-two number.

A sub-range of range [L,R] is a range [l, r] where L\le l \le r \le R. Since the length of a range is a positive integer number, we can also represent a range as a union of power-of-two sub-ranges. For example, [2, 12]= [2, 9] \cup[10,11] \cup [12, 12]. The full range, [2, 12], has a length of 11. The three sub-ranges have lengths of 8, 2, and 1, respectively.

In general, any range [L, R] can be divided by a set of sub-ranges whose lengths are power-of-two numbers.

The main idea of a sparse table is to first compute all answers to queries on the power-of-two ranges. When we answer a range query on [L, R], we can first split it into power-of-two sub-ranges. Then, we look up the precomputed answers and combine them into our final answer.

4.2. Sparse Table Recurrence Formula

For each start location i (1 \le i \le n), there are at most \log_2n + 1 number of power-of-two sub-ranges: [i, i+2^0-1], [i, i+2^1-1], [i, i+2^2-1], …, [i, i+2^k-1] where k is biggest number to make i+2^k-1 \le n. Therefore, we can use an n \times (\log_2n + 1) two dimensional array to represent the sparse table.

Each sparse table entry spares[i][j] (1 \le i \le n and 0 \le j \le \log_2n) stores the query result on the range [i, i+2^j-1]. For example, spares[1][3] stores the query result on the range [1, 8].

When j=0, we store the query results for the length 1 ranges:

Rendered by QuickLaTeX.com

When j > 0, we store the query results for length 2^j ranges. We can compute the value from previously computed smaller ranges:

Rendered by QuickLaTeX.com

For example, we store the answer to the range [1,8] in the table entry sparse[1][3]. Its value comes from the answers to the ranges [1, 4] and [5, 8]. Their sparse table entries are sparse[1][2] and sparse[5][2] respectively.

4.3. Sparse Table Construction

Similar to the preprocessing of our brute force approach, we can use dynamic programming to construct the sparse table:

algorithm PreprocessSparse(A, n):
    // INPUT
    //    A = a number array with size n
    // OUTPUT
    //    The minimum values of all possible power-of-two ranges of A

    sparse <- initialize an n x (log2(n) + 1) 2D array

    for i <- 1 to n:
        sparse[i, 0] = A[i]

    for j <- 1 to log2(n):
        for i <- 1 to n:
            sparse[i, j] = min(sparse[i, j-1], sparse[i + 2^(j-1), j-1])

    return sparse

In this algorithm, we first compute the query results for all length 1 ranges. Then, we use a nested for loop to compute query results by increasing the range length. The higher range length query results are from the query results of lower range lengths.

During the computation, we need to compute the power-of-two values: 2^{j-1}. In some programming languages, like C++ and Java, we can achieve this with bit operations like 1 << (j-1). Otherwise, we can precompute all the power-of-two values up to n and look up the value when we need it.

Both time and space complexity of the sparse table construction is O(nlogn).

4.4. Sparse Table for Range Minimum Queries

When we compute the minimum number of a range, it doesn’t matter if we process a value in the range once or twice. Therefore instead of splitting a range into multiple non-overlapping ranges, we can also split the range into only two overlapping ranges that have the same power-of-two length.

For example, we can split range [2, 12] into 3 non-overlapping ranges [2, 9], [10,11], and [12, 12]. Then, the range minimum query on [2, 12] is the minimum of the three queries on the sub-ranges: RMQ(2, 12) = \min(RMQ(2,9), RMQ(10, 11), RMQ(12, 12)).

Also, we can just split [2, 12] into 2 overlapping ranges [2,9] and [5,12], with the same length of 8. For the minimum function, we can count the same number multiple times without affecting the final result: \min(a, b) = \min(a, a, b, b). Therefore, we can also have RMQ(2, 12) = \min(RMQ(2,9), RMQ(5, 12)).

For a range minimum query on the range [L, R], we can first find its maximum power-of-two length k. For example, the maximum power-of-two length of range [2, 12] is 8. Then, we look up the sparse table for the two ranges [L, L+k-1] and [R-k+1, R]. Finally, the answer to the range [L, R] is the minimum of the above two range query results:

algorithm RangeMinimumQuerySparseTable(A, n, RMQ):
    // INPUT
    //    A = a number array with size n
    //    RMQ = a list of range queries
    // OUTPUT
    //    the minimum value of each range query

    sparse <- PreprocessSparse(A, n)
    answer <- an empty array
    
    for (L, R) in RMQ:
        j <- log2(R - L + 1)
        Append (min(sparse[L, j], sparse[R - 2^j + 1, j])) to answer

    return answer

To find the maximum power-of-two length of a range [L, R], we need to compute the logarithm value \log_2(R - L + 1). We can precompute all the logarithm values for fast processing:

algorithm Logarithm(n):
    // INPUT
    //    n = the array size
    // OUTPUT
    //    All logarithm values from 1 to n

    log <- an empty array
    log[1] <- 0

    for i <- 2 to n:
        log[i] <- log[i / 2] + 1

    return log

The sparse table construction takes O(n\log n) time. For each query, we can answer it in O(1) time. Therefore, the overall time complexity is O(n\log n + m). The space complexity is O(n\log n) for the sparse table.

5. Sparse Table Applications

We can also use a sparse table to solve other types of range queries. For example, we can use the same logic to find the maximum number of a range query in O(1) time.

In general, we can define a function f that computes a result value on a list of numbers. For example, f computes the minimum value for a range minimum query: f(A_1, A_2, ..., A_n) = \min(A_1, A_2, ..., A_n).

We can use a sparse table to answer a range query based on the function f in O(1) time if f has the following 2 properties:

  • f is associative: f(a, b, c) = f (f(a, b), c) = f(a, f(b, c)). With this property, we can split the whole range into sub-ranges and apply f on those sub-ranges.
  • f is overlapping friendly: f(a, b, c) = f ( f(a, b), f(b, c)). With this property, we can just use two overlapping sub-ranges to compute the result on the whole range.

For example, the greatest common divisor (GCD) function is associative as GCD(a, b, c) = GCD (GCD(a, b), c) = GCD(a, GCD(b, c)). Also, it is overlapping friendly as GCD(a, b, c) = GCD( GCD(a, b), GCD(b, c)). Therefore, we can use the same logic to construct a sparse table with GCD values on power-of-two ranges and answer a range GCD query in O(1) time.

6. Sparse Table Limitations

If the query function f is not overlapping friendly, we cannot answer the range query in O(1) time. For example, the sum function f(A_1, A_2, ..., A_n) = \sum_{i=1}^{n}{A_i} is associative but not overlapping friendly. We have to split the whole range into several non-overlapping sub-ranges to answer a range sum query, which takes O(logn) time. However, we can use a more efficient prefix sum approach to answer the range sum queries.

Also, if any element in the array changes, we have to recompute the sparse table in O(nlogn) time. Therefore, a sparse table is better for range queries on immutable arrays. That is, the array data can’t be changed between two queries.

7. Conclusion

In this article, we showed how to use a sparse table to solve the range minimum query problem. Also, we discussed the sparse table applications in a general form and the limitations of the sparse table data structure.