1. Overview

In this tutorial, we’ll discuss the problem of pairing socks from an array that contains N pairs of socks.

First, we’ll define the problem and provide an example that explains it.

Second, we’ll present a naive approach. Then we’ll discuss two different approaches to improve it, and obtain better solutions.

2. Defining the Problem

Suppose we have N pairs of socks randomly distributed inside an array of length 2 \times N. Each sock may have one or more specifications, such as size, color, etc; therefore, we have to pair each sock with another of the same specifications.

To understand this more clearly, let’s look at a simple example.

Let’s say that we have an array with 8 socks. Additionally, each sock has a special color and a special size:

Example

Now suppose that the color of each cell represents the color of the corresponding sock, while the number written inside the cell represents the sock size. Consequently, we can find the solution to pairing the socks:

Solution

Note that in this tutorial, we’ll discuss finding only one possible solution; we won’t consider the case where we might have more than one answer.

3. Naive Approach

3.1. Main Idea

The main idea here is to iterate over the socks. For each sock, we’ll try to find a similar one that hasn’t been paired before.

3.2. Implementation

Let’s take a look at the implementation:

algorithm findSimilarPairsNaive(N, sock):
    // INPUT
    //    N = the number of pairs of socks
    //    sock = the array that represents the socks
    // OUTPUT
    //    Returns similar pairs of socks in sock
    
    answer <- an empty sequence
    taken <- an array of false, with length 2 * N
    
    for i <- 0 to 2 * N - 1:
        if taken[i] is true:
            continue
    
        taken[i] <- true
    
        for j <- i + 1 to 2 * N - 1:
            if sock[i] = sock[j]:
                answer <- add (sock[i], sock[j]) to answer
                taken[j] <- true
                break

    return answer

Initially, we declare a list called answer, which will store the result as pairs of socks. In addition, we declare an array called taken, which expresses whether each sock has been taken before or not. Furthermore, we initialize this array with false values.

Next we iterate over the socks. If the sock hasn’t been taken before, we try to find a similar one by iterating over the remaining socks and checking for similarities. If a match is found, we add the similar socks to answer, and flag both of them as taken.

Finally, we return answer, which stores the similar pairs of socks in the given array.

3.3. Complexity

The time complexity here is \boldsymbol{ O(N^2) }. Let’s examine the reason behind this complexity.

To start, we iterate over all the socks. For each one, we iterate over all the socks located after it inside the array. We’ll find that the (2 \times N)^{th} sock will iterate over 0 socks because no socks come after it. Similarly, the (2 \times N - 1)^{th} sock will iterate over 1 sock, and so on. As a result, this step will cost 0 + 1 + 2 + ... + 2 \times N - 1, which equals \frac{(2 \times N - 1)*(2 \times N)}{2}.

4. Sorting Approach

4.1. Main Idea

The main idea here is to sort the array of socks first. When we sort it, similar socks will be next to each other; therefore, we only have to iterate over the sorted array and check each sock with its adjacent.

4.2. Implementation

Let’s take a look at the implementation:

algorithm findSimilarPairsSorting(N, sock):
    // INPUT
    //    N = the number of pairs of socks
    //    sock = the array that represents the socks
    // OUTPUT
    //    Returns similar pairs of socks in sock

    answer <- an empty sequence
    sorted <- sort(sock)

    i <- 0
    while i <= 2 * N - 2:
        if sorted[i] = sorted[i + 1]:
            answer <- add (sorted[i], sorted[i + 1]) to answer
            i <- i + 1
        i <- i + 1

    return answer

Initially, we declare a list called answer, which will store the result as pairs of socks. Then we sort the given array and store it in sorted.

Next we iterate over the socks, and check the similarity between the i^{th} sock and the {i+1}^{th} one. If they are similar, we add the similar socks to answer, and increase the iterator i by 1 to prevent the sock from joining more than one pair.

Finally, we return answer, which stores the similar pairs of socks in the given array.

4.3. Complexity

The time complexity here is \boldsymbol{ O(N.log_2(N)) }. Let’s examine the reason behind this complexity.

The main complexity of this approach is the sorting algorithm, so we prefer to choose an algorithm with O(N.log_2(N)) complexity as merge sort.

Then we iterate over all the socks once, and we only need O(N) to do that.

5. Hash Approach

5.1. Main Idea

The main idea here is to use a HashSet to store the last appearance of a similar sock. Consequently, we iterate over the socks, and in each step, we check whether the HashSet contains a similar sock or not.

5.2. Implementation

Let’s take a look at the implementation:

algorithm findSimilarPairsHash(N, sock):
    // INPUT
    //    N = the Number of pairs of socks
    //    sock = the array that represents the socks
    // OUTPUT
    //    Returns similar pairs of socks in sock

    answer <- an empty sequence
    hash_set <- create an empty hash set

    i <- 0
    while i < 2 * N:
        if hash_set contains sock[i]:
            answer <- add (sock[i], hash_set.get(sock[i])) to answer
            hash_set.delete(sock[i])
        else:
            hash_set.add(sock[i])
        i <- i + 1

    return answer

Initially, we declare a list called answer, which will store the result as pairs of socks. We also declare hash_set as an empty HashSet.

Then we iterate over the socks, and check whether hash_set contains a similar sock to the i^{th} one. If it does, we add both the i^{th} sock and the similar sock from hash_set to the result. In addition, we delete the similar one to prevent the sock from joining more than one pair. If the HashSet doesn’t contain a similar sock, we add the i^{th} sock to hash_set.

Finally, we return answer, which stores the similar pairs of socks in the given array.

5.3. Complexity

The time complexity here is \boldsymbol{ O(N) }.

First, we iterate over all the socks once, and we only need O(N) to do that. At each step, we check if hash_set contains a special sock, which has a O(1) complexity. Also, the operation of adding and deleting from hash_set has a O(1) complexity.

6. Conclusion

In this article, we presented the problem of pairing socks. We also explained the main ideas behind three different approaches to solving this problem, and walked through their implementations.