1. Introduction
Searching text in files is a very common task. In Linux, there are multiple ways to find a specific text string or pattern within a file. We can achieve this using commands like grep, awk, and find. Each of these commands offers unique features and functionalities for text searching and pattern matching.
In this tutorial, we’ll learn different ways to find a line containing N digits using the grep command. In addition, we’ll also learn to extract the lines containing N numbers from a given input file.
2. Sample Dataset and Toolset
To illustrate the use of grep to find lines containing N digits, we’ll use a sample dataset:
$ cat sample.txt
1 lazy dog
energetic dogs count: 2 3 5
all 345 dogs
car 3 cats 5 total 8
780 zippers 20 clippers 56
When 1002 zombies arrive
We save this sample dataset in a file named sample.txt. This will help us follow and test the commands used in the next sections.
We can extract lines with a specific number of digits (N) from a file using either grep or perl. Both of these commands provide several options to search for patterns in files and directories, enabling us to customize the search criteria and perform advanced pattern matching.
3. Extract Lines With N Digits Using grep
We can use grep to extract lines containing a specific number of digits:
grep -E '^[^0-9]*([0-9][^0-9]*){N}$' sample.txt
In this example, grep is the command used to find patterns in files, -E enables the extended regular expressions, ‘^[^0-9]*([0-9][^0-9]*){N}$’ is the regular expression to match the lines with N digits, and sample.txt is the file we want to search for patterns.
Let’s break down the regular expression used to extract lines with exactly N digits:
- ^ asserts the beginning of the line
- [^0-9]* matches zero or more non-digit characters
- ([0-9][^0-9]*){N} matches the N occurrences, where each occurrence starts with a digit followed by zero or more non-digit characters
- $ asserts the end of the line
We can replace N in the above command with any number, thus searching for lines in a file that contain a specific number of digits.
Let’s try to find all the lines containing exactly 3 digits from the sample.txt file:
$ grep -E '^[^0-9]*([0-9][^0-9]*){3}$' sample.txt
energetic dogs count: 2 3 5
all 345 dogs
car 3 cats 5 total 8
In this example, we extracted two lines containing exactly 3 digits each.
In addition, we can also extract lines with N or more digits. To do so, we remove the $ at the end of the regular expression. For example, we can extract lines with at least 3 digits:
$ grep -E '^[^0-9]*([0-9][^0-9]*){3}' sample.txt
energetic dogs count: 2 3 5
all 345 dogs
car 3 cats 5 total 8
780 zippers 20 clippers 56
When 1002 zombies arrive
Moreover, we can find and extract the lines with exact N digits using other commands as well.
4. Extract Lines With N Digits Using Perl
We can use Perl’s regular expression mechanism to find lines containing N digits in a familiar way.
For example, we can construct a Perl command to extract all lines with exactly 3 digits:
$ perl -ne 'print if s/\d/$&/g == 3' sample.txt
energetic dogs count: 2 3 5
all 345 dogs
car 3 cats 5 total 8
In the above example, perl invokes the Perl interpreter, -n iterates over the lines from the given input file, -e executes the specified Perl code on the command line, ‘print if s/\d/$&/g == 3’ is the Perl code to execute, and sample.txt is the file to search for patterns.
Let’s take a closer look at the Perl code used in the above command:
- print outputs the current line
- if evaluate and executes the condition if it is true
- s/ begins the substitution command
- \d/ matches exactly one digit
- $& represents the matched string, so no actual substitution takes place
- /g performs the substitution for all the occurrences in the line, i.e., global substitution
- == 3 compares the count of substitutions to the value 3, where == is the comparison operator
By adding a + after \d in the substitution command, we can find the lines with N numbers using the Perl command:
$ perl -ne 'print if s/\d+/$&/g == 3' sample.txt
energetic dogs count: 2 3 5
car 3 cats 5 total 8
780 zippers 20 clippers 56
The updated substitution command matches one or more digits with the \d+ option. Thus, the above example now extracts lines with 3 numbers, each having one or more digits.
Moreover, we can also change the comparison operator to match the condition. For instance, using <=, we get lines with 3 or fewer numbers:
$ perl -ne 'print if s/\d+/$&/g <= 3' sample.txt
1 lazy dog
energetic dogs count: 2 3 5
all 345 dogs
car 3 cats 5 total 8
780 zippers 20 clippers 56
When 1002 zombies arrive
In this example, we extracted lines with 3 or fewer numbers.
5. Extract Lines With Digit Counts Within a Range
Additionally, we can use grep to extract lines with digit counts within a specific range:
$ grep -E '^[^0-9]*([0-9][^0-9]*){N,M}$' sample.txt
In this command, N is the lower bound and M is the upper bound of a range. Both are included in the range. For example, we can extract lines with digit counts within a range of 1 to 3:
$ grep -E '^[^0-9]*([0-9][^0-9]*){1,3}$' sample.txt
1 lazy dog
energetic dogs count: 2 3 5
all 345 dogs
car 3 cats 5 total 8
In this example, we extracted lines with digit counts within a range of 1 to 3 using grep.
We can also construct a Perl command to do the same:
$ perl -ne 'print if ( s/\d/$&/g) >= 1 && (s/\d/$&/g) <= 3' sample.txt
1 lazy dog
energetic dogs count: 2 3 5
all 345 dogs
car 3 cats 5 total 8
In the above example, we used two conditions with the logical AND operator (&&) to extract lines that match both the given conditions.
6. Conclusion
In this article, we learned how to extract lines with N digits from a file. We discussed the usage of the grep and perl commands to extract lines with exactly N digits. Moreover, we also learned how to match lines with N numbers. We also discussed different comparison operators used in the perl command. Finally, we learned how to extract lines with digit counts within a specific range.