1. Introduction

In this tutorial, we’ll show how to find the two numbers with the minimum absolute difference in an array.

2. The Minimal Difference

We have a numerical array a=[a_1, a_2, \ldots, a_n] and want to find the two closest numbers.

For example, if a=[1, 5, 31, 7, 11], the minimal absolute difference is between a_2=5 and a_4=7, so our algorithm should output the pair (5, 7).

3. Solution

Let’s suppose that we’ve sorted a in the non-decreasing order: a_{(1} \leq a_{(2)} \leq \ldots \leq a_{(n)}. Then, the closest number to each a_{(i)} is immediately before or right after it. In other words, we need to check only the differences between the consecutive elements if the array is sorted (the order doesn’t matter).

So, we can find the minimal difference by iterating over the sorted a and computing |a_{(i-1)} - a_{(i)}| for each i>1. For instance, sorting a=[1, 5, 31, 7, 11], we get [1, 5, 7, 11, 31]. The absolute differences between consecutive elements are 4, 2, 4, 20, respectively. Therefore, the answer is the second pair: (5, 7).

3.1. Pseudocode

Here’s the pseudocode:

algorithm FindTwoClosestNumbers(a):
    // INPUT
    //    a = an array of n numbers
    // OUTPUT
    //    (p, q) = a pair with the minimal absolute difference in a

    a <- sort a
    (p, q) <- (a[1], a[2])

    for i <- 3 to n:
        if |a[i-1] - a[i]| < |p - q|:
            (p, q) <- (a[i-1], a[i])
            
    return (p, q)

First, we sort the array. Then, we declare the first pair (a_{(1)}, a_{(2)}) as the current closest pair and loop over the rest. We update the current closest pair if we find two consecutive numbers with a smaller absolute difference.

3.2. Complexity

What’s the time complexity of this approach? It depends on the sorting algorithm because the search for the minimal difference in the sorted array is O(n) in the worst and average cases. With Merge Sort, the sorting part takes O(n \log n) time in the worst case, so the entire algorithm’s complexity is O(n) + O(n \log n) = O(n \log n).

If we used Quicksort, whose worst-case time complexity is O(n^2), our approach would also be quadratic in the worst case. However, Quicksort is log-linear in the average case and has a smaller multiplicative coefficient than the Merge Sort. For that reason, pairing the linear search with Quicksort instead of Merge Sort may work faster in practice, although the latter is asymptotically more favorable.

Now, the question is: can we do better?

4. Rabin’s Algorithm in One Dimension

And the answer is that we can, at least when it comes to probabilistic time complexity. In 1976, Michael O. Rabin formulated a randomized algorithm for finding the closest pair in R^d and proved it achieved a linear time complexity with high probability. For d=1, the closest-pair problem reduces to ours.

4.1. The Idea

The main idea is as follows. First, we sample n pairs (a_i, a_j) of numbers from a (i \neq j) and compute the minimal difference between the pairs. Let’s denote it as \delta. Afterward, we sort all the numbers into consecutive segments of length \delta.

Then, the closest two numbers are either in the same segment or in two neighboring ones:

The minimum difference

So, we iterate over the segments and find the closest pairs in each segment and every two consecutive ones, keeping track of the minimal difference calculated. In the end, it corresponds to the actual minimum difference in a.

4.2. Pseudocode

Here’s the pseudocode:

algorithm RabinsAlgorithm(a):
    // INPUT
    //    a = an array of n numbers
    // OUTPUT
    //    (p, q) = a pair with the minimal absolute difference in a

    S <- sample n pairs of elements (a[i], a[j]) (i != j)
    δ <- compute the minimal difference between the numbers in the same pairs in S
    (p, q) <- the closest pair in S
    segments <- make an empty dictionary

    for i <- 1 to n:
        k <- floor(a_i / δ)
        Insert a[i] into segments[k]

    for k in sort(segments.keys):
        (x_k, z_k) <- find the closest pair in segments[k]
        (p, q) <- the closer between (p, q) and (x_k, z_k)

        if k > 1 and segments[k-1] exists:
            Find the closest pair of points (x_k, z_{k-1}) such that 
                x_k is in segments[k] and 
                z_{k-1] is in segments[k-1]
            (p, q) <- the closer between (p, q) and (x_k, z_{k-1})

    return (p, q)

First, we sample \boldsymbol{n} pairs and determine the closest one. Let \delta be the absolute difference between those two numbers. Then, we divide the array into \boldsymbol{\delta}-wide segments (let’s suppose there are m of those). We can determine the segment into which a_i falls by rounding down the quotient a_i / \delta. So, we keep the numbers that belong to the same segment in the same entry of  segments, a dictionary whose integer keys identify the \delta-wide segments, and the values are arrays (lists or trees) of the corresponding elements.

Afterward, we loop over the keys in segments in the sorted order. The closest two numbers in the entire array are either in the same segment or two consecutive ones. So, looping over segments[1], segments[2], \ldots, segments[m], we compare the current closest pair (p, q) to the closest one in each segments[k]. If segments[k-1] exists, we compare (p, q) to the two numbers such that one belongs to segments[k] and the other to segments[k-1].

4.3. Time Complexity

This algorithm has a linear expected time complexity with high probability, even if we use the quadratic brute-force approach to find the closest pairs in individual segments.

5. Conclusion

In this article, we showed how to find the closest two numbers in an array. We presented two algorithms: one based on sorting and the other randomized. The former is log-linear in the worst case if we use Merge Sort, but the latter achieves a linear expected time with high probability.