1. Overview

In our day-to-day work with Linux systems, we often encounter situations where we need to extract specific lines from a file by line numbers, for example, analyzing data or filtering logs.

In this tutorial, we’ll explore how we can efficiently achieve this task using sed and awk.

2. Introduction to the Problem

First, let’s prepare a text file as the example input:

$ cat input.txt
The Line Number 1
The Line Number 2
The Line Number 3
The Line Number 4
The Line Number 5
The Line Number 6
The Line Number 7
The Line Number 8
The Line Number 9
...
The Line Number 26
The Line Number 27
The Line Number 28
The Line Number 29
The Line Number 30

As we can see, the file input.txt has 30 lines. Each line in the file contains its line number so that we can later easily verify whether the output is expected.

Now, let’s look into the core problem. In practical scenarios, we often encounter diverse requirements for extracting lines from the input file. These requirements may include extracting specific lines such as lines x, y, and z, retrieving a range of lines from line x to line y, selecting lines at specific intervals (every x-th line), and many more.

To ensure we discuss the most frequently encountered scenarios, let’s extract lines from the input.txt file based on a combination of the following three requirements:

  • Output lines 2, 4, and 7
  • Line numbers from 10 to 20: Print if the line number is even
  • Line number > 20: Skip

Therefore, the expected output looks like the following:

The Line Number 2
The Line Number 4
The Line Number 7
The Line Number 10
The Line Number 12
The Line Number 14
The Line Number 16
The Line Number 18
The Line Number 20

Next, let’s see how to get this output using sed and awk.

3. Using the sed Command

The sed command is a convenient text-processing utility. We often see compact and effective sed commands to address text-related challenges. So, first, let’s use sed to output the required lines.

We have a combination of three requirements. We’ll start from the first and then extend the sed command to cover the second and third requirements.

3.1. Printing Lines x, y, and z

By default, the sed command processes input line by line and prints each line:

$ sed '' input.txt 
The Line Number 1
The Line Number 2
The Line Number 3
The Line Number 4
The Line Number 5
The Line Number 6
The Line Number 7
The Line Number 8
The Line Number 9
...
The Line Number 30

If we want sed only to print the specific lines, we can pass the -n option to suppress output and use the ‘p‘ command to print the lines we want:

$ sed -n '2p; 4p; 7p;' input.txt 
The Line Number 2
The Line Number 4
The Line Number 7

As the output above shows, the first requirement has been successfully met.

3.2. Output Every x-th Line

Our second task is to output even-numbered lines from line 10 to line 20. *sed allows us to use ‘*n~x p‘ to print from line ‘n‘  every ‘x-th‘ line until the end of the file. The ‘x’ here is the interval**. For example, let’s output from the second line, every fifth line from the input.txt file:

$ sed -n '2~5p' input.txt 
The Line Number 2
The Line Number 7
The Line Number 12
The Line Number 17
The Line Number 22
The Line Number 27

Therefore, ‘10~2p‘ prints all even lines from line 10 to the end of the file:

$ sed -n '2p; 4p; 7p; 10~2p;' input.txt 
The Line Number 2
The Line Number 4
The Line Number 7
The Line Number 10
The Line Number 12
The Line Number 14
The Line Number 16
The Line Number 18
The Line Number 20
The Line Number 22
The Line Number 24
The Line Number 26
The Line Number 28
The Line Number 30

Alternatively, we can use the ‘n‘ command to overwrite the current line with the next line:

$ sed -n '2p; 4p; 7p; 9,${n;p}' input.txt 
The Line Number 2
The Line Number 4
The Line Number 7
The Line Number 10
The Line Number 12
The Line Number 14
The Line Number 16
The Line Number 18
The Line Number 20
The Line Number 22
The Line Number 24
The Line Number 26
The Line Number 28
The Line Number 30

In the command above, ‘9, ${n;p}’ means from line 9 until the end of the file ($), we apply the action {n;p}. So the workflow looks like this:

  • Read line 9
  • (n) Read the next line (10) and overwrite the current line (9)
  • (p) Print the current line (10)
  • Read line 11
  • (n) Read the next line (12) and overwrite the current line (11)
  • (p) Print the current line (12)
  • Read line 13
  • ….

3.3. Applying the Range Limit

We’ve seen the ‘n~x’ address offers a fixed line interval (x). But it doesn’t support a range. *sed has the ‘*q‘ command to quit further processing.** For example, ‘5q‘ tells sed to quit when it reaches line 5. Our third requirement is to suppress lines if the line number exceeds 20. Therefore, we can append ‘21q‘ to the command to solve the problem:

$ sed -n '2p; 4p; 7p; 10~2p; 21q' input.txt 
The Line Number 2
The Line Number 4
The Line Number 7
The Line Number 10
The Line Number 12
The Line Number 14
The Line Number 16
The Line Number 18
The Line Number 20

For the ‘9, ${n;p}‘ approach, we can simply replace ‘$‘ with ‘20‘. This asks sed to apply the {n;p} action only between line 9 and line 20. After line 20, no line will be printed:

$ sed -n '2p; 4p; 7p; 9,20{n;p}' input.txt 
The Line Number 2
The Line Number 4
The Line Number 7
The Line Number 10
The Line Number 12
The Line Number 14
The Line Number 16
The Line Number 18
The Line Number 20

So far, we’ve solved the problem using sed. As we can see, *by using sed‘s address system, we can solve most line number-based output problems*. However, as sed doesn’t support variables and mathematical calculations, some dynamic requirements can be challenges for sed, for example, print lines if their line numbers correspond to values in the Fibonacci sequence.

4. Using the awk Command

awk is another powerful text-processing utility. It’s a C-like scripting language that supports variables, loops, logical checks, functions, mathematical calculations, etc. Therefore, awk* is more flexible to use than *sed.

As we don’t need to use special address syntax and commands in awk, we can implement the line number check logic in one shot:

$ awk 'NR == 2 || NR == 4 || NR == 7 || (NR >= 10 && NR <= 20 && NR % 2 == 0)' input.txt
The Line Number 2
The Line Number 4
The Line Number 7
The Line Number 10
The Line Number 12
The Line Number 14
The Line Number 16
The Line Number 18
The Line Number 20

As the example above shows, we’ve got the expected output. And the awk code is pretty easy to understand. It’s worth noting that NR is the index number of the current record, which is the line number by default.

Finally, to see awk‘s flexibility, let’s address the previous challenge – “Print lines if their line numbers correspond to values in the Fibonacci sequence”:

$ awk 'BEGIN{a=0;b=1} {while(NR>=a){fib[a]; c=a+b; a=b; b=c} } NR in fib' input.txt 
The Line Number 1
The Line Number 2
The Line Number 3
The Line Number 5
The Line Number 8
The Line Number 13
The Line Number 21

The Fibonacci numbers’ implementation is straightforward. Additionally, since we don’t know how many lines the input file may have and don’t want to generate numbers we don’t need, we added a check in a while loop to only generate Fibonacci numbers if the last Fibonacci number is less than the current line number.

Through this example, we may realize that if the requirement is complex, awk is probably a better choice than sed.

5. Conclusion

In this article, we’ve explored using sed and awk to print specified lines from a file based on line numbers.

sed can solve most line-number-based problems using its address system. However, awk can be a better choice if the requirement is dynamic and complex.