1. Overview
When working with text files in Linux, grep is a powerful tool for searching patterns. However, there are times when we don’t want the entire line but rather a limited context surrounding our search pattern. Unfortunately, grep doesn’t provide a built-in option to limit the context by a specific number of characters.
In this tutorial, we’ll explore how to achieve this effectively.
2. Introduction to the Problem
Let’s first understand the problem through an example. In this tutorial, we’ll use GNU grep for demonstrations.
2.1. Understanding the Problem
grep is a line-based matching command. However, sometimes we may want to limit the context of our search results to a specific number of characters around the matching pattern.
First, let’s look at our input file:
$ cat file.txt
not matched line
>>>987654321A b c d e f123456789<<<
>>> * B b c d e f * <<<
not matched line
321C b c d e f123
not matched line
D b c d e f
As we can see, in the input file, some lines match the pattern “[ABCD] b c d e f“. We can easily get these lines using grep:
$ grep '[ABCD] b c d e f' file.txt
>>>987654321A b c d e f123456789<<<
>>> * B b c d e f * <<<
321C b c d e f123
D b c d e f
Now, let’s say we don’t want the entire line of each match. Instead, we would like to have five characters (context characters) before and after our search pattern in the output:
54321A b c d e f12345
* B b c d e f *
321C b c d e f123
D b c d e f
As our target output shows, whitespace counts as context characters as well. Also, if fewer than five characters are on either side of the matched part, all these characters should be taken.
Now that we understand the problem, let’s understand grep‘s built-in context support through an example.
2.2. grep‘s Context Support
Let’s say we have a text file:
$ cat input.txt
v v v v v v v
line before 2
line before 1
important info
line after 1
line after 2
v v v v v v v
The grep command can output additional context lines before and after the match line:
$ grep -C 2 'important info' input.txt
line before 2
line before 1
important info
line after 1
line after 2
In the example above, we pass the option “-C 2″, to ask grep to output two “context” lines surrounding the match.
We now understand that grep‘s context support is also based on lines. Therefore, we cannot simply use the -C option to solve our problem.
Next, let’s see how the job gets done.
3. The Solution
Now, let’s look at the grep command that solves the problem and then understand how it works:
$ grep -oE '.{0,5}[ABCD] b c d e f.{0,5}' file.txt
54321A b c d e f12345
* B b c d e f *
321C b c d e f123
D b c d e f
In the above command, we use the option -E to tell grep to treat our pattern as the ERE (Extended Regular Expression) instead of the default BRE (Basic Regular Expression).
Then, the ERE pattern *‘.{0,5}[ABCD] b c d e f.{0,5}’ means capturing up to 5 characters before and after the “*[ABCD] b c d e f” pattern.**
Finally, since we only want to output the text matching the ERE pattern, we pass the -o option to grep, to print only the matched parts.
4. Introducing a Variable
We can also declare a variable to hold the required context character’s count to make the command friendly for shell scripts or easier to adapt different context character counts:
$ n=6; grep -oE ".{0,$n}[ABCD] b c d e f.{0,$n}" file.txt
654321A b c d e f123456
* B b c d e f *
321C b c d e f123
D b c d e f
We should note that when we use shell variables in grep‘s pattern, we must use double quotes, or the shell variable will not be expanded.
Also, it’s worth mentioning that if we launch the above command in a terminal, the variable n is still available after the command execution. We can verify it by printing the variable’s value:
$ echo $n
6
If we want the variable only to affect the grep command, we can group the entire command in a pair of “*( … )*” brackets:
$ ( n=3; grep -oE ".{0,$n}[ABCD] b c d e f.{0,$n}" file.txt )
321A b c d e f123
B b c d e f
321C b c d e f123
D b c d e f
$ echo $n
$
This time, as the output shows, the n variable isn’t available after the command execution. In other words, the variable only affects the grep command. This is because **( commands )* starts a subshell to execute “*commands“.** Thus, the current shell isn’t affected.
5. Conclusion
In this article, we’ve explored, through examples, how to limit the context of our grep search results to a specific number of characters around the matching pattern.
The solution can be a powerful addition to our toolkit, allowing us to harness the potential of text processing on Linux.