1. Overview
When we want to search some patterns in text files in the Linux command line, the grep command would be the first idea.
Indeed, with the power of Regex, grep is good at pattern matching jobs. However, sometimes, we would like to search for some pattern in a file after a specified line number.
In this tutorial, we’ll explore how to achieve that.
2. Introduction to the Problem
As usual, let’s understand the problem quickly by an example. Let’s say we have a log file:
$ cat -n app.log
1 [INFO] Application started
2 [INFO] Invoking remote API ... Successful
3 [WARN] User "Eric" gave wrong password: 5 times
4 [ERROR] RuntimeException: File not found: /foo/bar/aFile
5 ... stack trace ...
6 [INFO] Application refreshed
7 [ERROR] RuntimeException: File not found: /foo/bar/newFile
8 ... stack trace ...
9 [WARN] TemplateNotFoundException: Template PRETTY not found, loading the default template
10 [INFO] Default template loaded
11 [WARN] Cleanup job done with IOException: Disk is full
As the output above shows, we have an application’s log file. We’ve used the cat command with the -n option to display content with line numbers.
There are some log entries containing the word “Exception”. If we want to find those entries, we can simply use the command grep ‘Exception’ app.log. It’s not a challenge for us at all.
However, we’d like to add an additional requirement: We need to do the exact search, but only search the lines after the sixth line. That is to say, only lines 7, 9, and 11 should appear in the output.
Next, let’s see how to solve the problem.
3. Using the tail and grep Commands
The first idea to solve the problem is to extract the lines that we want to look at first, then execute grep on those lines.
We can use the tail *command “tail -n +x input” to take the lines from the x-th line until the end of the file*. So, for example, we can extract lines from line six to the end of the app.log file:
$ tail -n +6 app.log
[INFO] Application refreshed
[ERROR] RuntimeException: File not found: /foo/bar/newFile
... stack trace ...
[WARN] TemplateNotFoundException: Template PRETTY not found, loading the default template
[INFO] Default template loaded
[WARN] Cleanup job done with IOException: Disk is full
Next, let’s execute the grep command on the output above to get the required log entries:
$ tail -n +6 app.log | grep 'Exception'
[ERROR] RuntimeException: File not found: /foo/bar/newFile
[WARN] TemplateNotFoundException: Template PRETTY not found, loading the default template
[WARN] Cleanup job done with IOException: Disk is full
As we can see, the command above has solved the problem. We’ve got the expected log entries.
However, sometimes, we would like to execute the grep command with the -n option to print the line numbers of each match. For example, this helps us locate the log entries with “Exception” and take a closer look at the stack trace to analyze the cause. So, let’s execute the command with the -n option:
$ tail -n +6 app.log | grep -n 'Exception'
2:[ERROR] RuntimeException: File not found: /foo/bar/newFile
4:[WARN] TemplateNotFoundException: Template PRETTY not found, loading the default template
6:[WARN] Cleanup job done with IOException: Disk is full
As we can see, this time, the command has printed the line numbers of matched lines. However, since we piped the tail command’s output to grep, the line numbers reported by the grep command are not the actual line numbers in the original input file.
Of course, we can get the actual line numbers through this calculation: LINE_NO_BY_GREP + 6 – 1. For example, the first match “RuntimeException…” is located at line 2 + 6 – 1 = 7 in the app.log file.
This calculation works, but it can be inconvenient if we’re working on a large input file.
Next, let’s see if we can get the required matched lines, together with the actual line number from the original file.
4. Using the awk Command
awk is a powerful weapon for command-line text processing. Using the awk command, we can solve the problem in one shot:
$ awk 'NR >= 6 && /Exception/{ print NR, $0 }' app.log
7 [ERROR] RuntimeException: File not found: /foo/bar/newFile
9 [WARN] TemplateNotFoundException: Template PRETTY not found, loading the default template
11 [WARN] Cleanup job done with IOException: Disk is full
The awk command above is pretty straightforward. It prints a line with its line number if both conditions are satisfied:
- NR >= 6 – Its line number must be greater than or equal to six.
- /Exception/ – The line must contain the word “Exception”.
As we’ve seen in the output, we’ve got the required lines containing the word “Exception”. Moreover, the line numbers in the output are the actual line numbers in the app.log file.
5. Using the sed Command
sed is another handy command-line utility for processing text in Linux. First, let’s have a look at how the sed command solves the problem:
$ sed -n '6,${ /Exception/{=;p} }' app.log
7
[ERROR] RuntimeException: File not found: /foo/bar/newFile
9
[WARN] TemplateNotFoundException: Template PRETTY not found, loading the default template
11
[WARN] Cleanup job done with IOException: Disk is full
As the output shows, sed has solved the problem. But the output format is slightly different from what we got with awk: The matched line number and the matched line are in two contiguous lines.
Finally, let’s walk through the compact one-liner to understand how it works:
- sed -n – Suppresses the automatic printing – in other words, we’ll control when to print a line on our own
- 6,$ – Defines an address from line 6 to the end of the file
- /Exception/{… } – If a line is in the address above and matches the /Exception/ pattern, then the {…} actions will be executed
- {=;p} – We perform two actions here: printing the current line number (=) and printing the current line (p)
6. Conclusion
In this article, we’ve discussed how to search for a pattern in a file, starting after a given line number.
We’ve addressed three approaches through examples to solve the problem.