1. Introduction
In this tutorial, we’ll learn about different algorithms to generate all -element subsets of a set containing elements. This problem is referred to as -combinations.
The mathematical solution to find the number of -combinations is straightforward. It’s equal to the binomial coefficient:
For example, let’s assume we have a set containing 6 elements, and we want to generate 3-element subsets. In this case, there are 6 ways that we can choose the first element. Then, we choose the second element from the remaining 5. Finally, we choose the third and last element of the subset from the remaining 4. The number of choices we’ve had so far is exactly .
Since the ordering of our choices doesn’t matter as we’re constructing a subset, we need to divide by . Hence we reach the same formula:
However, generating all such subsets and enumerating them is more complicated than merely counting them. Let’s have a look at different algorithms that create the subsets in different ways in the following sections.
The algorithms we’ll review in the following sections were initially compiled in “The Art of Computer Programming” by computer scientist Donald Knuth.
2. Lexicographic Generation
Lexicographic order is an intuitive and convenient way to generate -combinations.
The algorithm assumes that we have a set containing elements: {0, 1, … , }. Then, we generate the subsets containing elements in the form , starting from the smallest lexicographic order:
The algorithm visits all -combinations. It recursively represents multiple nested for loops. For example, when =3, it’s equivalent to 3 nested for loops to generate combinations of length 3:
algorithm visitCombinations(n):
// INPUT
// n = The upper limit for generating combinations
// OUTPUT
// Visits all unique combinations of three different numbers
// that are not higher than n.
for c3 in 2 to n - 1:
for c2 in 1 to c3 - 1:
for c1 in 0 to c2 - 1:
// Visit the combination c3, c2, c1
visitCombination(c3, c2, c1)
In each iteration of the lexicographic generation algorithm, we find the smallest value, i.e., the rightmost element , that we can increment. Then we set the subsequent elements to their smallest values.
For instance, when we have = 6 and = 3, the algorithm generates the lexicographically ordered sequence:
210
420
510
532
310
421
520
540
320
430
521
541
321
431
530
542
410
432
531
543
3. Filling a Rucksack Generation
We can formulate the -combinations as a Knapsack problem. Subsequently, an optimal solution involves visiting nodes in a binomial tree.
3.1. Binomial Trees
Binomial Trees is a tree family used in combination generation. They are denoted by for , such that:
- If =0, the tree consists of a single node
- If , the binomial tree of order contains the root and binomial subtrees
For example, contains a root and subtrees :
It’s easy to see that can be iteratively constructed from by inserting another copy of . So, contains nodes. Moreover, the number of nodes at each level equals to in for in {0,…,-1}. In this sense, the variations of Lexicographic generation algorithm traverses the nodes of on level .
3.2. Filling a Rucksack Algorithm
Let’s recall the 0-1 Knapsack Problem, where we either take or don’t take any element from a set of size to fill a rucksack. We can formulate our problem in a similar fashion, where we have a rucksack of size and we want to find all subset combinations of elements of set S. Let’s set for . Subsets with number of elements in the form are valid solutions to this problem.
In this case, every possible solution corresponds to a node of . We can use the following algorithm to visit each valid node in preorder:
The algorithm only visits valid nodes of , skipping non-valid nodes. After the initialization phase, we visit the combination , using up space out of . Then we try to add elements to the combination at hand without exceeding the space restriction. If an element fits into a rucksack, we place it in after exhausting all possibilities using in its place.
4. Revolving Door Generation
Another way to represent the -combinations problem is using Gray codes.
4.1. Gray Codes
Imagine a revolving door connecting two rooms. There are people in the left room and people on the right one. Hence, we have people in total. In this setting, we let a person go into the other room only by switching with somebody else, so that the number of people in the rooms never change:
To represent the people in the rooms, we utilize bit binary strings. 0’s represent the people in the left room, and 1’s represent the people in the other room. Hence, 2 bits of a combination string bits are flipped when two people use the revolving door to switch places.
Let’s construct an algorithm to generate a sequence of combinations that don’t repeat. To generate such a sequence, we adopt Gray codes. We’ll enumerate orderings, where two successive representations always differ by one binary bit. Then, we’ll select only the binary representations containing zeroes and ones amongst them. Hence, the code sequence will satisfy the revolving door constraint.
By definition, we generate Gray binary codes of lenght recursively. Let represent a binary code of length . We canformulate it more formally as:
In other words, we generate Gray codes of length by inserting a leading 0 or 1 to Gray codes of length -1.
4.2. Revolving Door Algorithm
As our set has elements, we formulate a combination with a bit string of length . This string represents which elements to include in the combination with a 1, and not to include with a 0. To ensure we generate only the codes valid for the -combinations problem, we need to generate codes containing zeros and ones:
For instance, let’s consider the case =6 and =3. Let’s generate :
000111
011010
110001
101010
001101
011100
110010
101100
001110
010101
110100
100101
001011
010110
111000
100110
011001
010011
101001
100011
Converting the bit strings to combinations , we see a clear pattern:
210
431
540
531
320
432
541
532
321
420
542
520
310
421
543
521
430
410
530
510
In this sequence, occurs in increasing order. But for fixed values of , appears in increasing order. Moreover, for fixed combinations, values are increasing again. So, we can say that the characters of combination alternatingly increase and decrease.
Putting it all together, we construct the revolving door algorithm to visit -combinations:
5. Near-Perfect Schemes and Perfect Schemes
We call a Gray path sequence homogeneous if only one is changed at each step. Since we have multiple indices changing simultaneously, the revolving door algorithm does not generate a homogeneous sequence. For instance, the transition from 210 to 320 or 432 to 420 doesn’t obey the homogeneity rule.
We can generate a homogenous Gray sequence for =6 and =3:
000111
010101
101100
100011
001011
010011
101001
110001
001101
011001
101010
110010
001110
011010
100110
110100
010110
011100
100101
111000
This bitstring corresponds to a homogenous sequence:
210
420
532
510
310
410
530
540
320
430
531
541
321
431
521
542
421
432
520
542
Even better, we can generate all -combinations by strongly homogenous transitions. In such a sequence, the index moves by at most 2 steps. These generation schemes are called near-perfect.
Chase formulated an easy to compute version of nearly-perfect sequences. Again, let’s say we have elements, and we want to find -combinations. Hence, we generate the binary sequence containing ones and zeros recursively:
As a result, we can use this algorithm to generate a sequence visiting all -combinations in near-perfect order. It’s called Chase’s sequence. There are exactly 2 such combinations.
Deducing from near-perfect combinations, we wonder if it’s possible to generate -combination sequences by only switching adjacent bits. This form is called a perfect scheme. Mathematicians proved that finding a perfect combination is possible only if or or is odd. In short, finding a perfect scheme is possible in only 1 in 4 cases.
6. Comparison
In summary, the -combinations generation problem constructs elegant patterns. Heatmaps built with the binary outputs of the discussed strategies make these patterns easier to follow:
The image above was published in “The Art of Computer Programming”, volume 4A (2011).
7. Conclusion
In this article, we’ve learned about five different ways to enumerate -combinations of a set. Lexicographic order algorithm generates the subsets of size in alphabetical order, as the name suggests. The revolving door algorithm generates -combinations in alternating lexicographic order.
Homogeneous generation ensures that consecutive generations differ by only one element. Near-perfect schemes further ensure that the changed element moves at most two steps. Perfect schemes allow the changed element’s index to be incremented or decremented by only one.