在 Linux 中从文件读取随机行

1. Overview

In this tutorial, we’ll look at different strategies to randomly choose a line of text from a file.

2. Creating Our Sample File

Before we begin, let’s create a file called example_file.txt, which will contain 10 numbered lines:

$ : > example_file.txt; seq 1 10 | xargs -I % echo line % | tee -a example_file.txt

3. Using shuf

First, let’s look at the shuf command. In this case, we provide the number of lines we want, and shuf will return a random line:

$ shuf -n 1 example_file.txt
line 2

4. Using awk

Another way we can approach the problem is with the use of awk.

The first approach is when we know the number of lines in the file:

$ n_lines=$(wc -l < example_file.txt); awk -v min=1 -v max=$n_lines '
BEGIN {
    srand()
    r_number=int( min + rand() * (max - min + 1) )
} 
NR == r_number' example_file.txt

Now let’s take a closer look at the code:

With n_lines=$(wc -l < example_file.txt), we get the number of lines from the file
With -v min=1 -v max=$n_lines, we pass to the awk script the information about the range from which we’ll obtain the random number
Here, in: r_number=int( min + rand() * (max – min + 1) ), is where we get the random number from the range that we construct with the passed variables
And with NR == r_number, we’re instructing awk that if the number of the current registry is the same that the random number created previously, then execute the default awk action: print the current line

The second approach is useful when we don’t know the number of lines in the file or in case we only want to use awk without help from any other commands:

$ awk '
BEGIN{ srand() } 
rand() * NR < 1 { 
    line = $0 
} 
END { print line }' example_file.txt

With the srand function, we use the current date and time as the seed to generate random numbers. For that reason, we’ll get the same line if we run our command multiple times in a short period.

5. Using perl

We can also use Perl to create a script similar to the awk strategy:

$ perl -e 'srand; rand($.) < 1 && ( $line = $_ ) while <>; print $line' example_file.txt

There’s more interesting information about this algorithm implementation in the section How do I select a random line from a file? of the Perl documentation.

6. Using sed

Now, let’s use the sed command:

$ rnd=$(( 1 + $RANDOM % $(wc -l < example_file.txt) )); sed -n "${rnd}p" example_file.txt

Let’s take a closer look at the last command line:

In the section rnd=$(( 1 + $RANDOM % $(wc -l < example_file.txt) )), we choose a random number from within range of 1 to the number of lines in the file
In the section sed -n “${rnd}p” example_file.txt, we instruct sed to print the line whose number is equal to the previously obtained random number

We’re able to achieve this thanks to the fact that bash* and zsh have the internal variable *RANDOM and the modulo operator.

7. Exploiting the Capabilities of Bash and Zsh

Although creating a strategy using only bash and zsh isn’t the best idea because of how unclear and not very portable it would be, it’s interesting from a didactic perspective.

We’ll implement each of our approaches using only the built-in commands and internal variables of bash and zsh.

If we know the number of lines in a file:

$ n_lines=0
while read l
do 
    ((++n_lines))
done < example_file.txt
rnd_line=$(( 1 + RANDOM % n_lines))
n_line=1
while read line
do 
    (( n_line == rnd_line )) && break
    ((n_line++))
done < example_file.txt
echo "$line"

Otherwise, let’s lean on what we did with perl and awk:

$ n_line=1
while read crt_line
do 
    (( RANDOM % n_line < 1 )) && line="$crt_line"
    ((++n_line))
done < example_file.txt
echo "$line"

In addition to running the code on the command line, we can save it to a file to be run as an executable.

8. Conclusion

In this article, we looked at different ways to read a random line of a file by applying various strategies that include using standard Linux commands and using shell-specific commands.

Persistence

REST

Security