1. Overview

In practice, grep is one of the most widely used Linux utilities to effectively search for text in files. Unfortunately, there are scenarios where leading whitespace characters in the search results could be very noisy for the end user.

In this tutorial, we’ll learn how to grep without the leading whitespaces.

2. Understanding the Scenario

Let’s start by creating a few test files, namely file1, file2, and file3:

$ printf "\t\tfoobar in file1" > file1
$ printf "        foobar, now in file2" > file2
$ printf "\t    \tmore foobar in file3" > file3

We must note that each sample file contains some leading whitespace and “foobar” as the common text pattern for simulating the scenario.

Next, let’s use the grep command to search for “foobar” across the sample files:

$ grep foobar file*
file1:        foobar in file1
file2:        foobar, now in file2
file3:            more foobar in file3

We can notice that each output line starts with the filename, separated by the content of the matching line with a colon(:). Additionally, the leading whitespace gives an inconsistent appeal to the line content in the output.

In the following sections, we’ll use these scenario files to learn several strategies for doing a grep without the leading whitespace.

3. Using sed

We can use the sed utility’s group-based substitution capabilities to eliminate the leading whitespace from the content.

Let’s go ahead and write a one-liner sed command that we can pipe to the original grep command:

$ grep foobar file* | sed -n -E -e 's/(.*:)([ \t]*)(.*)/\1 \3/p'
file1: foobar in file1
file2: foobar, now in file2
file3: more foobar in file3

We must note that we’re comprehending each output line in three groups, namely \1 for the filename, \2 for the leading whitespace, and \3 for the rest of the content. Therefore, we eliminated the leading whitespace by skipping \2 from the substitution pattern while keeping \1 and \3.

4. Using awk

awk is another excellent text-processing utility that we can use to eliminate the leading whitespace. Since awk interprets each line as a record comprising fields, we can use the colon(:) as a field delimiter to separate the filename and the content in our scenario.

Let’s start by taking a look at the gsub() function for replacing a regex pattern in the target string with a replacement text:

gsub(regexp, replacement, target)

Next, let’s write a one-liner awk command that uses gsub() function to replace the leading whitespace from the content-specific field ($2):

$ grep foobar file* | awk -F':' '{gsub(/^[ \t]*/,"",$2);print $1":", $2}'
file1: foobar in file1
file2: foobar, now in file2
file3: more foobar in file3

We must note that we concatenated the filename, represented by $1, with the processed content line, represented by $2.

5. Using Shell Parameter Expansion

Shell parameter expansion allows us to capture prefixes and suffixes with a given pattern efficiently.

Let’s say that we capture each line from the grep output result in a variable entire_line. Then after, we can use the colon(:) as a delimiter to extract the filename and content with whitespace using parameter expansion:

filename="${entire_line/:*/}"; content="${entire_line/$filename:/}";

Next, we can use the extglob option in parameter expansion to remove all the leading whitespace from the content variable using the Bash extended globbing:

content=${content##+([[:space:]])};

Finally, let’s take a look at the entire code that executes these operations for each output line using a while loop:

$ echo "$(grep "foobar" file*)" |
> while read -e "entire_line";
> do
>     filename="${entire_line/:*/}";
>     content="${entire_line/$filename:/}";
>
>     shopt -s extglob;
>     content=${content##+([[:space:]])};
>     echo "$filename: $content";
> done
file1: foobar         in file1
file2: foobar, now in file2
file3: more foobar in file3

We must note that we need to use the shopt command to set the extglob option.

6. Conclusion

In this article, we explored popular tools such as sed and awk to grep without leading whitespace. Additionally, we solved the same use case with the shell parameter expansion.