1. Overview
Reading text files is a common operation when we work with the Linux command-line. Sometimes, we know the line X in a file contains interesting data, and we want to just read line X.
In this quick tutorial, we’ll have a look at different approaches to read a specific line from a file.
2. Introduction to the Problem
The problem is pretty straightforward. Let’s get a more clear picture through an example.
For instance, we have a file called input.txt:
$ nl input.txt
1 I am line 1, I don't have any interesting data.
2 I am line 2, I don't have any interesting data.
3 I am line 3, I don't have any interesting data.
4 I am line 4, I don't have any interesting data.
5 I am line 5, interesting data: Linux is awesome!
6 I am line 6, I don't have any interesting data.
7 I am line 7, I don't have any interesting data.
As the output above shows, we’ve used the nl command to print the file’s content with line numbers.
We know that the input.txt file contains some interesting information in the fifth line. Therefore, we want to read line five only.
There are many ways to do that in the Linux command line. In this tutorial, we’ll explore four approaches:
- Using pure Bash commands
- Using the sed command
- Using the awk command
- Using the head and tail commands
Next, let’s see them in action.
3. Using the Pure Bash Commands
To solve the problem, let’s create a shell script getLine.sh:
$ cat getLine.sh
#!/bin/bash
FILE="$1"
LINE_NO=$2
i=0
while read line; do
i=$(( i + 1 )
test $i = $LINE_NO && echo "$line";
done <"$FILE"
The shell script above looks pretty simple. It accepts two arguments: the file and the target line number.
Basically, it contains only a loop. In the loop, we increment a counter variable $i. When it reaches the given target line number, we output the line. For example, if we run the script with the input.txt file:
$ ./getLine.sh input.txt 5
I am line 5, interesting data: Linux is awesome!
The output shows that the expected line has been printed. Our script works.
If we read the script carefully, we may find that there is room to optimize it.
We check every line in the file in the loop, even if we’ve found and printed the line we require. Well, it’s not a problem if we run this script with our input.txt. After all, our example input file has only seven lines. However, in the real world, we may handle files with seven million lines.
Therefore, it’ll be good if we can break the loop after we’ve found the target line. So, let’s change the script a little bit:
$ cat getLine2.sh
#!/bin/bash
FILE="$1"
LINE_NO=$2
i=0
while read line; do
i=$(( i + 1 ))
case $i in $LINE_NO) echo "$line"; break;; esac
done <"$FILE"
We’ve used a case statement to break the loop once we’ve found the line we need. Let’s give it a test:
$ ./getLine2.sh input.txt 5
I am line 5, interesting data: Linux is awesome!
It works, too. So, we’ve solved the problem with a little Bash script.
4. Using the sed Command
The sed command is pretty good at solving this kind of problem. Let’s see a couple of compact sed one-liners to do the job:
$ sed '5!d' input.txt
I am line 5, interesting data: Linux is awesome!
$ sed -n '5p' input.txt
I am line 5, interesting data: Linux is awesome!
In the first one-liner, “5!d” means delete all lines except line five, while in the second command, “-n ‘5p’” means print only the fifth line.
The two one-liners work as we expected. However, similar to the Bash script, they will walk through the entire input file. Thus, they’ll take an unnecessarily long time if the input file is large.
The sed has provided a ‘q‘ command that allows to “quit” further processing. We can put the ‘q‘ command in the two one-liners:
$ sed '5!d;q' input.txt
I am line 5, interesting data: Linux is awesome!
$ sed -n '5{p;q}' input.txt
I am line 5, interesting data: Linux is awesome!
From the outputs, we don’t see any difference. So, let’s run the sed command with and without ‘q‘ using the sedsed (a sed debugging utility) tool to see how the ‘q‘ command works.
First, let’s have a look at the version without the ‘q‘ command:
$ sedsed -d '5!d' input.txt
PATT:I am line 1, I don't have any interesting data.$
HOLD:$
COMM:5 !d
PATT:I am line 2, I don't have any interesting data.$
...
I am line 5, interesting data: Linux is awesome!
PATT:I am line 6, I don't have any interesting data.$
HOLD:$
COMM:5 !d
PATT:I am line 7, I don't have any interesting data.$
HOLD:$
COMM:5 !d
Then, we can see that the sed command processed the file all the way through the last line (line seven).
Next, we’ll test the sed command with ‘q‘:
$ sedsed -d '5!d;q' input.txt
PATT:I am line 1, I don't have any interesting data.$
HOLD:$
COMM:5 !d
PATT:I am line 2, I don't have any interesting data.$
...
PATT:I am line 5, interesting data: Linux is awesome!$
HOLD:$
COMM:q
I am line 5, interesting data: Linux is awesome!
As the debug output shows, the sed processing stopped at line five.
5. Using the awk Command
The awk command is another powerful text processing tool. It can also solve the problem with a compact one-liner: awk ‘NR==5’ input.txt.
However, as we’ve discussed previously, we want to stop further processing after printing line five.
Similarly, awk has the ‘exit‘ command to exit the current processing:
$ awk 'NR==5{ print; exit }' input.txt
I am line 5, interesting data: Linux is awesome!
Thus, as the output above tells, we’ve solved the problem.
6. Using the head and tail Commands
Using the head and tail commands, we can easily get the first and last parts of a file.
If we combine the two commands, we can also read a specific line.
Let’s say we want to read line X. The idea is:
- First, we get line 1 to X using the head command: head -n X input
- Then, we pipe the result from the first step to the tail command to get the last line: head -n X input | tail -1
Let’s test if this idea works with our example:
$ head -n 5 input.txt | tail -1
I am line 5, interesting data: Linux is awesome!
Great! We’ve got the expected output and solved the problem.
7. Conclusion
In this article, we’ve addressed different ways to read a specific line from an input file.
Further, we’ve discussed how to optimize the Bash, sed, and awk solutions to gain better performance.