如何在两个有序数组中找到共同元素

1. Introduction

In this tutorial, we’ll explain how to find common elements in two sorted arrays.

2. Common Elements of Two Sorted Arrays

In this problem, we have two sorted arrays: $a=[a_1 \leq a_2 \leq \ldots \leq a_n]$ and $b=[b_1 \leq b_2 \leq \ldots \leq b_m]$ . Our task is to find the common elements.

For instance, if a = [1, 2, 3, 4, 5] and b = [3, 4, 5, 6, 7] , our algorithm should output [3, 4, 5] as the result.

To find the common elements efficiently, it should use the fact that and are already sorted. Additionally, we’ll also require the output array to be non-descending as the input arrays.

We’ll also assume that and can contain duplicates. So, if a common element is repeated twice in , and thrice in , we’ll include it twice in the result array. For instance:

$\begin{equation*} \begin{matrix} a&=& [1& 2& 2& 4& 5] \\ b&=& [2& 2& 2& 3& 5& 7] \\ result&=& [2& 2& 5]\\ \end{matrix} \end{equation*}$

3. Finding Common Elements in Linear Time

Let’s start with the naive algorithm that doesn’t use the fact that and are sorted:

algorithm NaiveFindCommonElements(a, b):
    // INPUT
    //    a = a sorted array with n elements
    //    b = a sorted array with m elements
    // OUTPUT
    //    c = the sorted array of the common elements of a and b
    
    c <- an empty array

    for i <- 1 to n:
        for j <- 1 to m:
            if a[i] = b[j]:
                // We found a common element
                append a[i] to c

    return c

For each a_i , the algorithm iterates over the entire to check if $a_i \in b$ . So, it has an O(mn) time complexity, which implies O(n^2) if and are comparable. What can we change in this algorithm to get common elements faster?

3.1. Linear Search

Well, since the arrays are sorted, there’s no point in iterating over $\boldsymbol{b_{j+1}, b_{j+2}, \ldots, b_{m}}$ if $\boldsymbol{a_i < b_j}$ . Each of those is greater than a_i , so we can conclude that $a_i \not \in b$ . The converse is also true: if b_j < a_i , we can discard b_j .

That way, we iterate over each array only once. So, we don’t need nested loops. Instead, we can start with i=1 and j=1 . When a_i < b_j , we discard a_i by incrementing . Similar goes for if b_j < a_i . If a_i=b_j , we append a_i to the result array and update both counters. The loop stops once we reach the end of or . This approach is similar to the merge step in the Merge Sort algorithm:

algorithm LinearAlgorithmForCommonElements(a, b):
    // INPUT
    //    a = a sorted array with n elements
    //    b = a sorted array with m elements
    // OUTPUT
    //    c = the sorted array of the common elements of a and b
    
    c <- an empty array
    i <- 1
    j <- 1

    while i <= n and j <= m:
        if a[i] < b[j]:
            i <- i + 1
        else if b[j] < a[i]:
            j <- j + 1
        else: // We've got a match: a[i] = b[j]!
            append a[i] to c
            i <- i + 1
            j <- j + 1
            
    return c

As a result, the algorithm’s time complexity is linear: O(m + n) .

3.2. Example

Let’s use the above example to show how the algorithm works.

Initially, is empty, and both and point to the first elements:

Common elements - step 1

Since a_1=1 < a_2=2 , we increment :

Common elements - step 2

Now, we have a match, so we append 2 to and increment both counters:

Common elements - step 3

We have a match again, so we do the same as in the previous step:

Common elements - step 4

Now, a_3=4 > b_3=2 , so we discard b_3 and move on to b_4 :

Common elements - step 5

The situation is the same ( a_3=4 > b_4=3 ), so we increment only :

Common elements - step 6

Since a_3=4 < b_5 = 5 , we increment only :

Common elements - step 7

We found a new common element, so we add it to and increment both counters:

Common elements - step 8

Finally, i>n=5 , so we stop and output the result: c=[2, 2, 5] .

4. Finding Common Elements in Logarithmic Time

Let’s suppose that m << n . Then, for each $\boldsymbol{j=1,2,\ldots,m}$ , we can look for $\boldsymbol{b_j}$ in $\boldsymbol{a}$ with the logarithmic binary search. Since $b_j \leq b_{j+1}$ , if the binary search finds b_j at the -th position in , we can discard all the element to the left and search for $b_{j+1}$ in the remainder: $[a_{k+1}, a_{k+2}, \ldots, a_n]$ .

Similarly, let’s say the binary search doesn’t find b_j in , but the last index it checks in is . If $b_{j+1} = b_j$ , we search for $b_{j+1}$ in $[a_{k+1}, a_{k+2}, \ldots, a_{n}]$ . On the other hand, if $b_{j+1} > b_j$ , it can still happen that $b_{j+1}=a_{k}$ , so we don’t discard a_k .

4.1. Pseudocode

Here’s how the above algorithm works:

algorithm FindCommonElementsBinarySearch(a, b):
    // INPUT
    //    a = a sorted array with n elements
    //    b = a sorted array with m elements
    // OUTPUT
    //    c = the sorted array of the common elements of a and b
    
    c <- an empty array
    k <- 1
    j <- 1

    while j <= m and k <= n:
        // Find the position b[j] should be at in a[k:n]
        k <- BINARY-SEARCH(b[j], a[k:n])
        
        if b[j] = a[k]:
            append b_j to c
            k <- k + 1
        else:
            if j < m and b[j+1] > b[j]:
                k <- k + 1
        
        j <- j + 1

    return c

The worst-case time complexity is $O(m\log n)$ . If m << n , we can consider to be a constant with respect to . In that case, the algorithm has an approximate $\boldsymbol{O(\log n)}$ complexity. However, if $m \in \Theta(n)$ , we get an $O(n \log n)$ algorithm. That’s still better than the naive approach but worse than the O(n+m) runtime of the two-side linear search.

5. Conclusion

In this article, we presented two efficient ways of finding common elements of two sorted arrays. One approach has an O(n+m) complexity, whereas the other combines binary and linear searches and runs in an $O(m \log n)$ time.

Persistence

REST

Security