数组洗牌 | Baeldung中文网

1. Introduction

In this tutorial, we’ll learn how to shuffle an array.

2. Shuffling

In many applications, we have an array $a = [a_1, a_2, \ldots, a_n]$ and need to shuffle it. For example, that’s the case when doing permutation tests to check feature importance in machine learning. Shuffling the array means getting a random permutation of its elements.

We require all permutations to be equally likely.

3. Naïve Approach

We could first try something like this. We could iterate over $i=1,2,\ldots, n$ , and for each randomly choose an index $j \in \{0, 1, \ldots, n\}$ and swap a_i with a_j :

algorithm NaiveShuffling(a):
    // INPUT
    //    a = an array [a_1, a_2, ..., a_n]
    // OUTPUT
    //    A random permutation of a

    for i <- 1 to n:
        j <- a random number from {1, 2, ..., n}
        Swap a[i] and a[j]

    return a

This may come as an intuitive thing to do, but the issue is that not all permutations are equally likely.

In each of the iterations, we can choose any of the elements of . So, there are n^n equally likely executions but only $n!=n \cdot (n-1) \cdot \ldots \cdot 2 \cdot 1$ different permutations. Since isn’t divisible by n-1 for any n > 2 , we can’t spread out n^n executions over permutations evenly. So, some permutations are more likely than others.

3.1. Example

Here are all possible executions when the input is [1, 2, 3] :

Shuffle [1, 2, 3] with the naive algorithm.

As we see, permutations [1, 3, 2], [2, 1, 3], [2, 3, 1] occur with the probability $\frac{5}{27}$ each, whereas the rest with $\frac{4}{27}$ .

4. The Fisher-Yates Algorithm

Instead, we can use the Fisher-Yates algorithm. For each $i = 1, 2, \ldots, n-1$ , it randomly selects an index from $\{ i, i+1, \ldots, n\}$ to make a swap:

algorithm FisherYatesAlgorithm(a):
    // INPUT
    //    a = an array [a_1, a_2, ..., a_n]
    // OUTPUT
    //    A random permutation of a

    for i <- 1 to n - 1:
        j <- a random number from {i, i + 1, ..., n}
        Swap a[i] and a[j]

    return a

Basically, whenever we choose a_j to swap with the current element a_i , we discard it from participating in the swaps with $a_{i+1}, a_{i+2}, \ldots, a_{n}$ . We achieve that by restricting the range for .

Therefore, the Fisher-Yates algorithm splits the array into two parts: a[1:(i-1)] and a[i:n] . We’ll call them the inactive and active parts. In the th iteration, the algorithm can choose only active elements for the exchange with a_i . Afterward, the inactive part grows to include the element at the th position.

So, for each $i=1,2,\ldots,n-1$ , there are n-i+1 active elements to choose from. As a result, the total number of possible executions is:

$[n \codt (n-1) \cdot (n-2) \cdot \ldots \cdot 2 = n!]$

which matches the number of permutations. Since we can choose an element for a swap only once, there is only one way to get each permutation. Consequently, they’re all equally likely and occur with the probability of $\frac{1}{n!}$ .

4.1. Example

Let’s say a = [1, 2, 3, 4] . Here’s a possible execution where | denotes the split:

$[\begin{matrix} | & 1 & 2 & 3 & 4 \\ 2 & | & 1 & 3 & 4 \\ 2 & 4 & | & 3 & 1 \\ 2 & 4 & 1 & | & 3 \end{matrix}]$

In the last step, there’s only one possibility for a_4 , so we stop and return [2, 4, 1, 3] .

4.2. Potential Issues

Random integer generation is at the core of the Fisher-Yates algorithm. If we can’t sample from a truly uniform distribution over $\boldsymbol{\{i, i+1, \ldots, n\}}$ , the algorithm will be biased toward some permutations. That can happen in two ways.

Let’s say we have a random number generator rand that returns integers from to N-1 inclusive ( is usually the maximal integer). We can get $j \in \{i, i+1, \ldots, n\}$ like this:

$[i + \left(rand() \bmod (n - i + 1) \right)]$

For all the remainders to be equally likely, must be divisible by each possible n-i+1 , which we can’t assume to be the case. As some remainders are more likely than others, not all s we compute with the above formula are equally probable. However, the higher the , the less the discrepancy.

On the other hand, if we have a generator that returns random numbers from [0, 1] , we can get like this:

$[i + round\left( rand() \cdot (n - i) \right)]$

Since computers have finite precision, rand() can’t return all real numbers from [0, 1] , so rounding won’t result in equally probable $0, 1, \ldots, n-i$ . Adding to the result of rounding does give us integers from the desired range, but not all are equally likely.

5. Conclusion

In this article, we showed how to shuffle an array.

The naïve approach is intuitive but incorrect since not all permutations are equally likely. The Fisher-Yates algorithm shuffles an array and returns each permutation with the same probability.

Persistence

REST

Security