1. Introduction

In this tutorial, we’ll cover recursive and iterative algorithms for enumerating all k-combinations of a set.

2. Combinations

A combination is a subset of a set. So, in this problem, we have a set S=\{a_1, a_2, \ldots, a_n\} with n elements and want to list all its subsets with exactly k items.

Unlike permutations, the order in which we choose the individual elements doesn’t matter. Instead, we’re interested in whether a particular element is in a combination.

For example, we don’t care about the order in which we selected five cards out of 52; we only care about which cards are in hand.

3. Recursive Algorithm 1: Partitioning by Elements in the Original Set

Recursive algorithms partition the problem into smaller sub-problems of the same type and combine their solutions to get the solution to the original problem. We differentiate between recursive algorithms based on their partitioning schemes.

We’ll discuss two approaches. The first partitions the problem by the elements in the original set S, whereas the other partitions by the selected elements.

3.1. Idea

We can partition the task by inspecting the set’s elements individually. For each item in the set, we can either include it in the selection or not.

So, we traverse the set in a predetermined order. If we include the first item (\boldsymbol{a_1}), we need to choose \boldsymbol{k-1} elements from the remaining \boldsymbol{n-1} items. On the other hand, if we discard \boldsymbol{a_1}, we need to select \boldsymbol{k} elements out of the remaining \boldsymbol{n-1} items.

To solve the smaller problems (the selection of k-1 and k elements out of n-1), we apply the same partitioning scheme:

Partitioning by the elements in the input set.

We can enumerate all k-combinations by traversing the tree and collecting its leaves. At each step, we keep track of the elements in the currently active combination (which corresponds to the path from the root to the current node). Going down an edge, we remember the decision about including the node’s element; going up, we reverse the decision associated with the edge.

3.2. Pseudocode

Here’s the pseudocode:

algorithm FindAllKCombinations(S, x, k):
    // INPUT
    //    S = {a_1, a_2, ..., a_n}
    //    k = number of elements to select (0 <= k <= n)
    // OUTPUT
    //    All k-combinations of S

    // x is the currently active combination

    if k > |S|:
        // No more elements are available, so this branch is a dead end
        return

    if |x| = k:
        // We made a combination
        Output x
    else:
        // Let a be the first element of S.
        // First, we explore the sub-tree with a included.
        FindAllKCombinations(S \ {a}, x + [a], k-1)
        
        // Then, we explore the sub-tree with a not included.
        Remove a from x
        FindAllKCombinations(S \ {a}, x, k)

x + [a] means appending a to x.

The chosen order of elements determines the order of combinations at the output. Usually, we specify \boldsymbol{S} as an iterable data structure and call the order lexicographic. We denote it with the “less than” symbol: a_1 < a_2 < \ldots < a_n.

3.3. Complexity and Issues

The time complexity of this approach is O(2^n). This is so because there are n nested decisions about n elements, and each decision can have two outcomes. Therefore, this approach is inefficient if the input set is large.

Furthermore, it takes O(n) stack space. So, we may get a stack overflow if n is too big.

4. Recursive Algorithm 2: Partitioning by Elements in the Combination

Instead of tracking the elements in the input set, we can partition the problem by tracking the items in the selection.

4.1. Idea

Let’s assume we want to output ordered combinations. So, a k-combination x should satisfy x[1] < x[2] < \ldots < x[k], where x[j] is the jth element of x.

To do so, the first item in a k-combination must come from [a_1, a_2, \ldots, a_{n-k+1}]. Otherwise, we can’t make an ordered combination.

If we choose \boldsymbol{a_j \in [a_1, a_2, \ldots, a_{n-k+1}]} as the first element, we can get an ordered combination only if we select the remaining \boldsymbol{k-1} items from those to the right of \boldsymbol{a_j}. In doing so, the first element in the (k-1)-combination must be from those to the left of  a_{n-k+2} (or constructing a combination won’t be feasible). So, we iterate over the window [a_{j+1}, a_{j+2}, \ldots, a_{n-k+2}] and repeat the process.

4.2. Pseudocode

Here’s the pseudocode:

algorithm FindKCombinations(S, x, k):
    // INPUT
    //    S = [a[j], a[j+1], ..., a[n]], a subset of {a_1, a_2, ..., a_n} 
    //    x = the currently active combination
    //    k = the number of elements to select (0 ≤ k ≤ n)
    // OUTPUT
    //    All k-combinations of S

    if |x| = k:
        // We have a combination
        Output x
    else:
        m <- min { n, n - (k - |x|) + 1 }
        for i <- j + 1, j + 2, ..., m:
            FindKCombinations([a[i+1], a[i+2], ..., a[n]], x + [a[j]], k)
        
        // Going up the recursion tree, we explore the combinations without a[j]

In the above pseudocode, the for loop chooses the next item. Then, it calls the algorithm recursively for each position in the active window. We stop when the required number of items have been selected, output a combination, and go up the recursion tree to find other combinations.

If the right end of the current window goes beyond the array, we need to cap it. That’s why we have the step m \leftarrow \min\{ n, n - (k - |x|) + 1\}.

4.3. Complexity and Issues

The number of iterations corresponds to the sum of window sizes. Therefore, this number also equals the total number of times a window covered an element of S.

We iterate over the element a_i in a window of size n-k+\ell+1 (\ell=0,1,2,\ldots,k-1) if \ell items from [a_1, a_2, \ldots, a_{i-1}] have already been selected. There are \binom{i-1}{\ell} ways of doing it, so the number of times a_i gets looped over is:

    [\sum_{\ell=0}^{\min\{k-1,i-1\}} \binom{i-1}{\ell}]

The \min\{k-1, i-1\} ensures we count valid windows. For example, there’s no way to select two elements before a_2, and this condition is there to disregard such windows.

So, the total complexity is:

    [\begin{aligned} \sum_{i=1}^{n}\sum_{\ell=0}^{\min\{k-1,i-1\}} \binom{i-1}{\ell} &= \sum_{\ell=0}^{k-1}\sum_{i=\ell+1}^{n} \binom{i-1}{\ell} \in O(2^n) \end{aligned}]

Although this algorithm is also exponential, its recursion tree is shallower than the first. It has k levels, so the stack size complexity is O(k).

Therefore, this approach can work for large sets if the number of elements to be selected is less than the maximum call stack depth.

However, if k is large, this method won’t work.

5. Iterative Algorithm

In the iterative approach, we start with the first lexicographic combination: [a_1, a_2, \ldots, a_k]. Then, we keep generating the next combination from the current one until we have generated all combinations.

5.1. Generating the Next Combination

To get the next combination from the current \boldsymbol{x}, we find the rightmost element of \boldsymbol{x} that we can replace with a lexicographically larger element. The last element x[k] is replaceable if it’s lexicographically smaller than a_n. Similarly, the second-to-last element x[k-1] is replaceable if it’s lexicographically smaller than a_{n-1}.

So, the general rule is that x[k-j] is replaceable if it’s lower than a_{n-j}. The equivalent formulation is that x[j] is replaceable if x[j] < a_{n-k+j}.

Let’s say that the rightmost replaceable element of x is at the jth position and equal to a_i. We replace the right sub-array x[j : k] with k-j+1 successors of a_i: [a_{i+1}, a_{i+2}, \ldots, a_{i+k-j+1}]. This way, we get the next combination.

5.2. Pseudocode

We start with the first lexicographic k-combination [a_1, a_2, \ldots, a_k]:

algorithm OutputKCombinationsIteratively(S, k):
    // INPUT
    //    S = {a_1, a_2, ..., a_n}, the set of elements
    //    k = the number of elements in each combination (0 ≤ k ≤ n)
    // OUTPUT
    //    All k-combinations of S

    // Start with the first lexicographic combination
    x <- [a_1, a_2, ..., a_k]
    Output x

    j <- RightmostReplaceable(x, k)

    while j != false:
        // Let ai be the element x[k].
        Set x[j : k] to [a[i+1], a[i+2], ..., a_[i+k-j+1]]
        Output x
        j <- RightmostReplaceable(x, k)

    algorithm RightmostReplaceable(x, k):
        // INPUT
        //    x = the current combination
        //    k = the number of elements in each combination
        // OUTPUT
        //    The index of the rightmost replaceable element or false if none exists

        j <- k
        while j >= 1 and x[j] = a_[n-k+j]:
            j <- j - 1

        if j >= 1:
            return j
        else:
            return false

The process stops when we get to the last lexicographic combination [a_{n-k+1},\ldots, a_{n-1}, a_n] since it has no replaceable element.

5.2. Complexity

If x[j] is the rightmost replaceable element in x, generating the next combination takes k-j+1 steps. So, the overall time complexity is:

    [O \left( \sum_{j=1}^{k}(k-j+1) \times N_j \right)]

where N_j is the number of combinations in which the rightmost replaceable element is at the jth position.

Let’s find N_j. If the rightmost replaceable is x[j], then x[j+1, j+2, \ldots, k]=[a_{n-k+j}, a_{n-k+j+1}, \ldots, a_n]. So, k-j elements are fixed, and the left part contains j elements we can select from \{ a_1, a_2, \ldots, a_{n-k+j}\}. Therefore:

    [N_j = \binom{n-k+j}{j}]

From there, we get the time complexity:

    [O \left( \sum_{j=1}^{k}(k-j+1) \times N_j \right) = O \left( \sum_{j=1}^{k}(k-j+1) \times \binom{n-k+j}{j} \right)]

This reduces to:

    [O \left( \frac{n+2}{(n+2-k)(n+1-k)} \binom{n+1}{k+1}\right)]

Taking k to be constant with respect to n, we get O(n^k). So, this algorithm has the same polynomial complexity as \boldsymbol{\binom{n}{k}}, the number of \boldsymbol{k}-combinations.

6. Conclusion

In this article, we explained three algorithms for generating combinations: two recursive and one iterative.

The recursive approaches might be easier to understand and code, but the iterative solution is the most efficient.