在两个有序数组的并集中找第k小的元素

1. Overview

In this tutorial, we’ll discuss the problem of finding the kth smallest element in the union of two sorted arrays. We’ll present a naive approach and then improve it.

In the end, we’ll present a comparison between both approaches and when to use them.

Please note that we have a java-based article for this problem. Hence, in this tutorial, we’ll focus more on the theoretical idea and comparison.

2. Defining the Problem

In this problem, we’re given 2 arrays, of size , and of size . Also, the problem will give us an integer . Both arrays and should be sorted in increasing order.

In other words:

$A_i \leq A_{i+1}$ for each $0 \leq i < n-1$

$B_j \leq B_{j+1}$ for each $0 \leq j < m-1$

We must find the kth smallest element in the sorted union of and . By the sorted union of and , we mean the array of size n+m , which contains all the elements of and , sorted in the increasing order as well.

Let’s take the following example to better explain the idea:

Example

As we can see, both arrays and are sorted initially. Similarly, the array , which is the union of arrays and , is sorted as well. Also, we can see that the array contains all elements from and .

Hence, if was equal to 6, then the answer would be 7. The reason is that 7 is at index 5 in the array (because we start numbering the array from zero).

Another way to look at it is that the number 7 has five numbers smaller than it in both arrays and . Therefore, it’s the sixth smallest value.

3. Naive Approach

First, we’ll explore the naive approach. Let’s take a look at its implementation:

algorithm NaiveApproach(A, B, n, m, k):
    // INPUT
    //    A = The first sorted array
    //    B = The second sorted array
    //    n = The size of the first array
    //    m = The size of the second array
    //    k = The order of the required element
    // OUTPUT
    //    Returns the kth smallest element in the union of A and B
    
    i <- 0
    j <- 0
    answer <- 0
    
    while i + j < k:
        if i = n:
            answer <- B[j]
            j <- j + 1
        else if j = m:
            answer <- A[i]
            i <- i + 1
        else if B[j] < A[i]:
            answer <- B[j]
            j <- j + 1
        else:
            answer <- A[i]
            i <- i + 1
    
    return answer

In the beginning, we set two iterators and that will show us the current index inside the arrays and , respectively.

Now, let’s think about how we’d merge arrays and . We’d start from the beginning. Each time, we’d add the smallest value to the resulting array. Also, we’d remove the smallest value from its location. Let’s do something similar in this approach.

In each step, we set the answer to the smaller value between and . In addition, we transfer the iterator who has this smaller value one step forward.

Also, we pay attention to the case when either or becomes out of the range of their respective arrays. In this case, we move the other iterator. Once we finish walking through values, we break the while loop.

In the end, we return the stored answer.

The complexity of the naive approach is , where is the required index. Therefore, in the worst case, the complexity is O(n+m) , where is the size of the first array and is the size of the second one.

4. Binary Search Approach

Let’s try to improve our naive approach.

4.1. General Idea

As we know, the binary search is a searching algorithm that finds the target value within a sorted range. In each step, we divide the range in half.

Next, we check the value in the middle of the range. Based on the result, we either choose the left or the right half of the range to continue our search in. We’ll use something similar here.

In the naive approach, we found the answer by taking elements from and elements from . Based on the same idea, we’ll perform a binary search to find the number of taken elements from the first array .

In each step, we can calculate the number of taken elements from the second array which is $j \gets k - i$ . After that, we can check whether this combination of and is valid or not. Based on the result, we can choose to take either more or fewer elements from .

4.2. Implementation

Take a look at the binary search approach:

algorithm BinarySearchApproach(A, B, n, m, k):
    // INPUT
    //    A = The first sorted array
    //    B = The second sorted array
    //    n = The size of the first array
    //    m = The size of the second array
    //    k = The order of the required element
    // OUTPUT
    //    Returns the kth smallest element in the union of A and B
    
    bestI <- 0
    L <- max(0, k - m)
    R <- n
    
    while L <= R:
        mid <- (L + R) / 2
        i <- mid
        j <- k - i
        
        if j = 0 or A[i - 1] > B[j - 1]:
            R <- mid - 1
            bestI <- mid
        else:
            L <- mid + 1
    
    i <- bestI
    j <- k - i
    
    if i != 0:
        return min(B[j], A[i - 1])
    else:
        return B[j - 1]

First of all, we’ll choose the range we should search in. Clearly, the right side should equal to .

However, when choosing the left side, the size of the array must be taken into account. Therefore, the left side shouldn’t be less than k-m , nor less than zero.

Now, we can use the binary search algorithm to find the smallest , such that the biggest value in range from is bigger than the biggest value in range from . As we can see, we’re taking elements from and i-k elements from . Therefore, in total, we take elements.

In each step, we compare the two values we have. If we choose not to take any elements from , or the value of A[i-1] is larger than B[j-1] , then we know we reached a valid value of and store it. We should also try to find a smaller value of . Therefore, we make the searching range equal to the left one.

Otherwise, it means we should move forward. In this case, we make the searching range equal to the right one.

In the end, we have two cases. Either equals zero, which means we shouldn’t take any elements from . As a result, the answer will equal to B[j-1] .

On the other hand, if we should take some elements from , then the kth smallest value is either B[j] or A[i-1] . The reason is that we know that A[i-1] is bigger than all elements in the range B[0..j-1] .

Therefore, the answer is either A[i-1] itself, or B[j] , whichever is the smallest.

The complexity of the binary search approach is , where is the size of the first array.

4.3. Example

Let’s take the same example in section 2, where we had two arrays and , and was 6. Take a look at the first step in that example:

Binary Search Example

In the beginning, the searching range is [0, 4] . Hence, mid equals 2. Therefore, equals 2 and equals 4.

We compare the values of A[i-1] and B[j-1] . Since A[i-1] is not greater than B[j-1] , we try to search on the right range of the array , without updating the value of bestI .

Note that, and represent the number of elements to take from each array. Since indexes start from zero, we compare A[i-1] with B[j-1] . Also, note that the cells in red denote the values that are being compared in each step.

Let’s move to the next step:

Binary Search Example 2

In the next step, the searching range becomes [3, 4] . From that we conclude that mid equals 3, equals 3 and equals 3.

We notice that A[i-1] is still not greater than B[j-1] . Thus, we continue searching on the right range of the array , without updating the value of bestI .

This leads us to the last step:

Binary Search Example 3

Finally, the searching range is now [4, 4] . This means that mid equals 4, equals 4 and equals 2. In this case, we notice that A[i-1] is actually greater than B[j-1] . As a result, we update the value of bestI .

The searching range becomes [4, 3] , which is invalid. Therefore, the algorithm doesn’t make any more steps.

Now that the binary search operation is finished, the answer is the minimum between A[i-1] and B[j] . In this case, A[3] is 9 and B[2] is 7. Hence, the algorithm returns 7 as the answer to the problem.

5. Comparison

When considering simplicity, the naive approach is both simpler to implement, and easier to understand. But, when considering the time complexity, the binary search approach generally has a lower complexity.

However, in the special case that is smaller than log(n) , using the naive approach should give us a better complexity.

6. Conclusion

In this tutorial, we explained the problem of calculating the Kth smallest element in the union of two sorted arrays. Firstly, we presented the naive approach.

Secondly, we discussed the theoretical idea and the implementation of the binary search approach.

Finally, we presented a summary comparison between both approaches and showed when to use each one.

Persistence

REST

Security