1. Introduction

In this tutorial, we’ll explain how to find common elements in two sorted arrays.

2. Common Elements of Two Sorted Arrays

In this problem, we have two sorted arrays: a=[a_1 \leq a_2  \leq \ldots \leq a_n] and b=[b_1 \leq b_2 \leq \ldots \leq b_m]. Our task is to find the common elements.

For instance, if a = [1, 2, 3, 4, 5] and b = [3, 4, 5, 6, 7], our algorithm should output [3, 4, 5] as the result.

To find the common elements efficiently, it should use the fact that a and b are already sorted. Additionally, we’ll also require the output array to be non-descending as the input arrays.

We’ll also assume that a and b can contain duplicates. So, if a common element is repeated twice in a, and thrice in b, we’ll include it twice in the result array. For instance:

    \begin{equation*} \begin{matrix} a&=& [1& 2& 2& 4& 5] \\ b&=& [2& 2& 2& 3& 5& 7] \\ result&=&  [2& 2& 5]\\ \end{matrix} \end{equation*}

3. Finding Common Elements in Linear Time

Let’s start with the naive algorithm that doesn’t use the fact that a and b are sorted:

algorithm NaiveFindCommonElements(a, b):
    // INPUT
    //    a = a sorted array with n elements
    //    b = a sorted array with m elements
    // OUTPUT
    //    c = the sorted array of the common elements of a and b
    
    c <- an empty array

    for i <- 1 to n:
        for j <- 1 to m:
            if a[i] = b[j]:
                // We found a common element
                append a[i] to c

    return c

For each a_i, the algorithm iterates over the entire b to check if a_i \in b. So, it has an O(mn) time complexity, which implies O(n^2) if m and n are comparable. What can we change in this algorithm to get common elements faster?

Well, since the arrays are sorted, there’s no point in iterating over \boldsymbol{b_{j+1}, b_{j+2}, \ldots, b_{m}} if \boldsymbol{a_i < b_j}. Each of those is greater than a_i, so we can conclude that a_i \not \in b. The converse is also true: if b_j < a_i, we can discard b_j.

That way, we iterate over each array only once. So, we don’t need nested loops. Instead, we can start with i=1 and j=1. When a_i < b_j, we discard a_i by incrementing i. Similar goes for j if b_j < a_i. If a_i=b_j, we append a_i to the result array and update both counters. The loop stops once we reach the end of a or b. This approach is similar to the merge step in the Merge Sort algorithm:

algorithm LinearAlgorithmForCommonElements(a, b):
    // INPUT
    //    a = a sorted array with n elements
    //    b = a sorted array with m elements
    // OUTPUT
    //    c = the sorted array of the common elements of a and b
    
    c <- an empty array
    i <- 1
    j <- 1

    while i <= n and j <= m:
        if a[i] < b[j]:
            i <- i + 1
        else if b[j] < a[i]:
            j <- j + 1
        else: // We've got a match: a[i] = b[j]!
            append a[i] to c
            i <- i + 1
            j <- j + 1
            
    return c

As a result, the algorithm’s time complexity is linear: O(m + n).

3.2. Example

Let’s use the above example to show how the algorithm works.

Initially, c is empty, and both i and j point to the first elements:

Common elements - step 1

Since a_1=1 < a_2=2, we increment i:

Common elements - step 2

Now, we have a match, so we append 2 to c and increment both counters:

Common elements - step 3

We have a match again, so we do the same as in the previous step:

Common elements - step 4

Now, a_3=4 > b_3=2, so we discard b_3 and move on to b_4:

Common elements - step 5

The situation is the same (a_3=4 > b_4=3), so we increment only j:

Common elements - step 6

Since a_3=4 < b_5 = 5, we increment only i:

Common elements - step 7

We found a new common element, so we add it to c and increment both counters:

Common elements - step 8

Finally, i>n=5, so we stop and output the result: c=[2, 2, 5].

4. Finding Common Elements in Logarithmic Time

Let’s suppose that m << n. Then, for each \boldsymbol{j=1,2,\ldots,m}, we can look for \boldsymbol{b_j} in \boldsymbol{a} with the logarithmic binary search. Since b_j \leq b_{j+1}, if the binary search finds b_j at the k-th position in a, we can discard all the element to the left and search for b_{j+1} in the remainder: [a_{k+1}, a_{k+2}, \ldots, a_n].

Similarly, let’s say the binary search doesn’t find b_j in a, but the last index it checks in a is k. If b_{j+1} = b_j, we search for b_{j+1} in [a_{k+1}, a_{k+2}, \ldots, a_{n}]. On the other hand, if b_{j+1} > b_j, it can still happen that b_{j+1}=a_{k}, so we don’t discard a_k.

4.1. Pseudocode

Here’s how the above algorithm works:

algorithm FindCommonElementsBinarySearch(a, b):
    // INPUT
    //    a = a sorted array with n elements
    //    b = a sorted array with m elements
    // OUTPUT
    //    c = the sorted array of the common elements of a and b
    
    c <- an empty array
    k <- 1
    j <- 1

    while j <= m and k <= n:
        // Find the position b[j] should be at in a[k:n]
        k <- BINARY-SEARCH(b[j], a[k:n])
        
        if b[j] = a[k]:
            append b_j to c
            k <- k + 1
        else:
            if j < m and b[j+1] > b[j]:
                k <- k + 1
        
        j <- j + 1

    return c

The worst-case time complexity is O(m\log n). If m << n, we can consider m to be a constant with respect to n. In that case, the algorithm has an approximate \boldsymbol{O(\log n)} complexity. However, if m \in \Theta(n), we get an O(n \log n) algorithm. That’s still better than the naive approach but worse than the O(n+m) runtime of the two-side linear search.

5. Conclusion

In this article, we presented two efficient ways of finding common elements of two sorted arrays. One approach has an O(n+m) complexity, whereas the other combines binary and linear searches and runs in an O(m \log n) time.