1. Introduction

In this tutorial, we’ll present Binary Insertion Sort. It’s a variant of Insertion Sort that uses Binary Search to improve performance.

2. Insertion Sort

Insertion Sort sorts the input array \boldsymbol{x=[x_1, x_2, \ldots, x_n]} by iteratively growing a sorted sub-array at the start of \boldsymbol{x}.

So, Insertion Sort first checks if x_2 < x_1 and swaps them if that’s the case. Then, it finds where it should insert x_3 so that it holds that x[1] \leq x[2] \leq x[3] (x[i] is the i-th element of x, whereas x_i is the initial value of x[i]). It continues like this, growing the sorted sub-array one element at a time. Before it inserts x_i, the sorted sub-array ![x[1:(i-1)] = [x[1], x[2], \ldots, x[i-1]]](/wp-content/ql-cache/quicklatex.com-5c7408bbe3ab9cddb9d624c11acc57e9_l3.svg "Rendered by QuickLaTeX.com") consists of the elements x_1, x_2, \ldots, x_{i-1} initially at positions 1 through i-1 but now in the sought order:

    [x[1] \leq x[2] \leq \ldots \leq x[i-1]]

Insertion Sort inserts \boldsymbol{x_i} right at the place that makes \boldsymbol{x[1:i]} also sorted. When it inserts x_n at the appropriate position, the whole array x becomes non-descending.

This is the pseudo-code of Insertion Sort:

algorithm TheInsertionSort(x):
    // INPUT
    //    x = the input array with n elements
    // OUTPUT
    //    x is sorted so that x[i] <= x[i+1] for all i = 1, 2, ..., n-1

    for i <- 2 to n:
        j <- i - 1
        // Find where to insert x[i] so that x[1:i] is sorted non-decreasingly.
        while j > 0 and x[j] > x[j+1]:
            // Swap x[i] with greater elements
            Swap x[j] and x[j-1]
            j <- j - 1

2.1. The Complexity of Insertion Sort

The worst-case input is an array x sorted in the opposite way (x_1 > x_2 > \ldots > x_n). In that case, Insertion Sort has to do i comparisons and swaps for each i=2,3,\ldots,n. In total, it does (n+2)(n-1)/2 \in O(n^2) swaps and performs the same number of comparisons. Therefore, the algorithm has the quadratic worst-case time complexity.

The average-case complexity of Insertion Sort is also O(n^2).

3. Binary Insertion Sort

The idea behind Binary Insertion Sort is to use Binary Search to find where to place each x_i. The goal is to reduce the number of comparisons.

This is the pseudo-code of BIS:

algorithm BinaryInsertionSort(x):
    // INPUT
    //    x = the input array with n elements
    // OUTPUT
    //    x is sorted so that x[i] ≤ x[i+1] for all i = 1, 2, ..., n-1

    for i <- 2 to n:
        // Find where to insert x[i] so that x[1:i] is sorted non-decreasingly.
        j <- BinarySearch(x[1 : (i - 1)], x[i])
        k <- i - 1

        while k > j:
            // Swap x[i] with greater elements
            Swap x[k] and x[k - 1]
            k <- k - 1


algorithm BinarySearch(a, b):
    // INPUT
    //    a = a sorted array with m elements
    //    b = the element to search for
    // OUTPUT
    //    The index where b should be in the sorted array a.

    low <- 1
    high <- m

    while low < high:
        mid <- (low + high) / 2

        if a[mid] = b:
            return mid
        else if a[mid] > b:
            low <- mid + 1
        else:
            high <- mid - 1

    // If we've come this far, b is not in the array, but its index should be low (=high).
    return low

3.1. The Complexity of Binary Insertion Sort

The number of swaps is the same as in the standard version of Insertion Sort.

In the worst case, we perform approximately \log_2 (i-1) comparisons for each i=3,4,\ldots, n, and do exactly one comparison for i=2:

    [1 + \sum_{i=3}^{n}\log_2{(i-1)}= 1 + \log_2 \left( \prod_{i=3}^{n} i \right) = 1 + \log_2 \left( \frac{n!}{2} \right) = \log_2 (n!)]

Using Stirling’s approximation, we get:

    [\log_2 (n!) = n \log_2 n - n \log_2 e + \Theta( \log_2 n ) \in O(n \log n)]

So, we conclude that the number of comparisons Binary Insertion Sort performs is log-linear in \boldsymbol{n} . However, since the number of swaps is \boldsymbol{O(n^2)}, both algorithms are asymptotically the same in the worst case. That’s also true of their average-case complexities.

4. When to Use Binary Insertion Sort?

Why should we bother implementing binary search and using it within Insertion Sort if the resulting complexity doesn’t change? It seems Binary Insertion Sort isn’t worth the extra work. The answer is that, although asymptotically equivalent to the standard version of Insertion Sort, Binary Insertion Sort usually works faster in practice. It compares fewer elements because of binary search.

If the elements \boldsymbol{ x_1, x_2, \ldots, x_n } are complex, and comparing two objects takes a lot of time, the time spent comparing the items will dominate that spent exchanging them. In such cases, the improvement brought by binary search pays off significantly. If we deal with simple types such as numbers of characters, we probably won’t notice any difference. However, in most real-world applications, our data will be more intricate.

5. Conclusion

In this article, we talked about Binary Insertion Sort. It’s a variant of Insertion Sort that uses Binary Search to find where to place x[i] in the input’s sub-array x[1:(i-1)] while iterating over i=2, 3, \ldots, n.

Although Binary Search reduces the number of comparisons to O(\log n) in the worst case, Binary Insertion Sort has a quadratic time complexity just as Insertion Sort. Still, it is usually faster than Insertion Sort in practice, which is apparent when comparison takes significantly more time than swapping two elements.


» 下一篇: N 层架构