如何在 Linux 中反转文件中行的顺序

1. Overview

Sometimes we have a text file that needs reversing. Perhaps it’s a log file where we want to see the most recent entries first. Similarly, we may also wish to reverse the output of a command.

In this tutorial, we’ll look at three different methods for reversing a text file or text stream in Linux. We’ll also compare their advantages and capabilities.

2. Tac

One of the core philosophies of Linux is “everything is a file”. So it’s no surprise that most Linux distributions come pre-equipped to help us.

cat is one of the most popular and well-known ways to output the contents of a file. On the other hand, t**ac is much a lesser-known command. Its name comes from cat spelled backward, and tac functions as a reverse cat.

Both commands belong to the coreutils package, which comes preinstalled in almost all Linux distributions.

Let’s first look at our test file with cat:

$ cat /tmp/test 
line_one
line_two
line_three

Now, let’s look at the same file with tac:

$ tac /tmp/test 
line_three
line_two
line_one

As we can see, tac reversed the output.

We can also use tac in pipes to reverse the output of a command:

$ cat /tmp/test | tac 
line_three
line_two
line_one

tac is the most straight forward and efficient way of reversing a file. It uses a single-core, and it has a single job that gets done pretty well.

But, it doesn’t offer any configuration options, beyond the line separator it uses, and the number of files it outputs.

3. nl/sort/cut Commands

Like cat and tac, these commands come with the coreutils package, which comes preinstalled in all of the common distributions. The sort, nl, and cut commands need to be chained together to reverse a file.

Let’s build up the chain a piece at a time, so we can understand how each command contributes.

3.1. nl

To be able to order the file in reverse order, we need an index for each row. So, we use the nl command to put line numbers at the beginning of each line:

$ nl /tmp/test 
     1    line_one
     2    line_two
     3    line_three

3.2. sort

Now we want to sort these indexed lines into reverse order, for which we should use sort:

$ nl /tmp/test  | sort -nr
     3    line_three
     2    line_two
     1    line_one

By default, sort orders lines lexically and arranges them from smallest to largest. We’re using a couple of parameters to change that here:

-n – numerical sort
-r – reverse order

We could also add some other parameters for extra performance:

–parallel – number of sorts to run at the same time
–batch – max number of inputs to process at once
-S – max memory sort can use

The optimal values for these parameters will depend on our system hardware and operating system limits.

3.3. cut

As we added a numeric index at the start of the process, we need to remove it to get our lines back to their original form:

$ nl /tmp/test  | sort -nr | cut -f 2-
line_three
line_two
line_one

The -f 2- parameter tells cut to print characters which appear after the second whitespace. Here, this means just after the line number generated by the nl command.

4. sed

sed is a stream editor for filtering and transforming text. It can handle the most complex text of processing tasks.

When it comes to text processing, sed is a powerhouse. It can find and replace files, remove all text after certain characters, and do many other things while also reversing our text.

4.1. Installation

sed is not a part of coreutils, yet it comes preinstalled in most major Linux distributions. We can also install it ourselves.

To install sed with the apt package manager:

$ sudo apt-get install sed

To install sed with the yum package manager:

$ sudo yum install sed

4.2. sed Script to Reverse a File

To reverse the given text file with sed:

$ sed '1!G;h;$!d' /tmp/test 
line_three
line_two
line_one

sed scripts are hard to understand, so let’s unpack this one.

4.3. Subcommands of sed

The one-liner above applies three sed commands to every line. The commands, separated by semicolons, are:

1!G – G command appends what is in the hold space to the pattern space, 1! makes sure this command will ignore the first line
h – copies the pattern space to the hold space
$!d – delete the line, $! makes sure this command will ignore the last line

For more, check out our in-depth guide on sed and its different spaces.

5. Comparison Between Methods

Let’s compare the methods we’ve learned.

5.1. Daily Performance

For testing, we’ll use a file with 100,000 lines and size of 6.6 megabytes:

$ wc -l test 
100000 test
$ du -h test 
6,6M test

Let’s reverse this file with tac while keeping track of time:

$ time tac test  
...
...
...
real    0m0,571s
user    0m0,004s
sys    0m0,086s

tac completes the task in about half a second.

Now, let’s do the same test with nl sort and cut:

$ time nl test | sort -nr | cut -f 2-
...
...
...
real    0m1,063s
user    0m0,122s
sys    0m0,236s

This command takes little more than a second to complete. Now, let’s run it on multiple cores, with 1 GB memory and batches of 1021:

$ time nl test | sort -nr -S 1G --parallel=7 --batch=1021 | cut -f 2-
...
...
...
real    0m0,882s
user    0m0,130s
sys    0m0,194s

Now it takes less than a second. This is not a big improvement.

Now, let’s time the sed command:

$ time sed '1!G;h;$!d' test 
...
...
...
real    0m54,336s
user    0m53,802s
sys    0m0,104s

This is by far the worst result. We wouldn’t use this for speed. However, the point of using sed is its advanced text processing capabilities.

As the results show, tac is the clear winner in terms of performance for daily tasks. So, is there any benefit of the sort approach?

5.2. Big Data Performance

sort can use multiple cores at once, and the real power of sort comes in to play when we deal with huge files on powerful workstations.

Let’s work on the following file:

$ du -h megafile 
54G    megafile
$ wc -l megafile 
1000000000 megafile

This file has 1,000,000,000 lines and is 54 gigabytes.

$ time tac megafile >> /dev/null

real    13m5.686s
user    0m59.028s
sys    0m47.556s

tac completes the task in 13 minutes 5 seconds.

Let’s try sort with 23 cores and 200 gigabytes of RAM:

$ time nl megafile | sort -S 200G -nr --parallel=23 --batch=1021 | cut -f 2-  >> /dev/null

real    6m34.510s
user    9m47.677s
sys    3m1.545s

tac clearly uses fewer resources overall, even though it takes longer.

However, if we need to get things done as quickly as possible on a huge dataset, sort is much better. And, the gap in performance will grow as the files get larger.

6. Summary

In this article, we covered some basic methods for reversing a text file in Linux.

We used default packages to achieve our goal and demonstrated the innate text processing capabilities of the Linux core with tac, sort and sed.

Then, we analyzed the advantages and disadvantages of the different approaches. We learned that tac is the fastest method for daily use, but sort takes the lead when it comes to dealing with huge data on powerful workstations.

sed, on the other hand, provides a huge level of flexibility and could reverse our file while doing other processing on it.

Persistence

REST

Security