从文件中删除空行 | Baeldung中文网

1. Overview

When we handle text files in Linux, we often need to remove blank lines from a file, to make it easier to read or to be processed further.

In this tutorial, we’ll discuss some common scenarios for removing blank lines from a file through practical examples.

2. The Problems

When we talk about blank lines in this tutorial, we’re talking about those lines that contain only whitespace characters.

Let’s say we have a plain text file:

$ cat with_blank.txt
    

   
This is the first non-blank line.

Some data comes here

-----

1
  2
    3
 
-----   
Data End.

This is the last non-blank line.

The file with_blank.txt, as the output above shows, contains blank lines, including three leading and three trailing blank lines.

Usually, there are three ways we might like to remove blank lines from a file:

Remove all blank lines in the file
Remove leading blank lines only — delete only from the beginning of the file until the first non-blank line
Remove trailing blank lines only — delete lines only after the last non-blank line in the file

In this tutorial, we’ll attempt to address these with grep, sed, awk, and tac commands.

3. The Pattern to Match a Blank Line

To remove blank lines, we need first to identify them. Regex seems like an obvious approach, and the most portable solution would be using POSIX BRE:

^[[:space:]]*$

The [:space:] is a POSIX standard character class and is the same as [ \t\n\r\f\v].

Since the character classes are quite often used, there are corresponding shorthand character classes available. For example, \s stands for the POSIX class [:space:], while \S is the equivalent of [^\s].

Many programming languages and text processing tools, including Java, Perl, Python, GNU grep, GNU sed, and GNU awk, support these shorthand character classes. We’ll see examples in later sections.

If we match a blank line using the shorthand character classes, the regex can be as compact as:

^\s*$

Next, let’s see how to solve the problems of removing blank lines.

4. Remove All Blank Lines From a File

Removing all blank lines is an easier problem compared to the removal of only leading or trailing blank lines. That’s because after we find one blank line, we don’t need to check if it should be kept or removed.

Our goal is to get the output:

This is the first non-blank line.
Some data comes here
-----
1
  2
    3
-----   
Data End.
This is the last non-blank line.

Let’s see how we can solve this problem.

4.1. Using grep

We know that the grep utility is good at searching text. However, removing lines is a kind of file editing operation. It seems that we are picking the wrong tool for the problem.

We *can make use of the grep‘s -v option to print lines that don’t contain a pattern of a blank line. Or we tell grep to output lines containing a non-blank character.*

Let’s take a look at how the grep command solves the problem:

$ grep -v '^[[:space:]]*$' with_blank.txt

If our grep implementation supports shorthand character classes, for example, the widely used GNU Grep, we can make the command pretty short:

$ grep '\S' with_blank.txt

To write the output back to the input file, we need to save the output in a temp file and then “mv” it to the original input file:

$ grep '\S' with_blank.txt > tmp.txt && mv tmp.txt with_blank.txt

4.2. Using sed

The sed command has the d action, which stands for deleting the current pattern space.

We can solve the problem straightforwardly by deleting a line if it matches the blank line pattern:

$ sed '/^[[:space:]]*$/d' with_blank.txt

We can also solve it the other way round: if a line contains a non-blank character, then we don’t delete the line (!d).

If our sed supports \S as a non-blank character class, as does GNU sed, the command can be as simple as:

$ sed '/\S/!d' with_blank.txt

Many sed implementations support “in place” editing so that we can save the changes back to the input file.

For example, with GNU sed, we can use the -i option:

$ sed -i '/^[[:space:]]*$/d' with_blank.txt

4.3. Using awk

Using the awk command, we can remove blank lines in different ways.

Let’s start with a straightforward solution:

$ awk '!/^[[:space:]]*$/' with_blank.txt

In the above solution, if a line doesn’t match our blank line pattern, we print it.

It is written in a pretty short form. If we write it in a complete way, we’ll have:

$ awk '{ if($0 !~ /^[[:space:]]*$/) print $0 }' with_blank.txt

Let’s understand why it can be written in that short form:

When we test a regex pattern, if we don’t give the testing string, awk takes the current line by default, so if($0 !~ /pattern/) can be written as if(!/pattern/)
We can also write a ‘{if(condition){action}}’ as ‘condition{action}’, thus, we have ‘!/^[[:space:]]*$/{print $0}’
The default action in awk is print $0, and a True will trigger the default action; therefore, we can omit the {print $0} and have ‘!/^[[:space:]]*$/’

Another way to solve the problem is to check if a line contains a non-whitespace character:

$ awk '/\S/' with_blank.txt

In addition to regex checks, we can also check awk‘s built-in NF variable to determine if a line is blank or not:

$ awk 'NF' with_blank.txt

The NF variable holds the number of fields in the current input line. In awk, the default field separator (FS) is a space.

If the FS is a space, then all leading and trailing whitespace characters are skipped. Therefore, if a line is blank, we don’t have any fields, in other words, the variable NF == 0.

In awk, a non-zero number will be evaluated as True. So, ‘NF’ will print all non-blank lines.

5. Remove Leading Blank Lines Only

If we want to only remove the leading blank lines, the main problem is to know where the first non-blank line starts.

The grep command cannot solve this problem. However, we can still do it with the powerful sed and awk utilities.

A working solution to the problem should print:

This is the first non-blank line.

Some data comes here

-----

1
  2
    3

-----   
Data End.

This is the last non-blank line.

5.1. Using sed

There are several ways to solve the problem using the sed command. Let’s look at two approaches using sed‘s address range.

The first solution focuses on the part of leading blank lines:

$ sed '1,/\S/{/\S/!d}' with_blank.txt

Let’s understand what is going on:

1, /\S/ is an address range. The selection starts from the first line to (inclusive) the first non-blank line
{/\S/!d} is the action we want to apply on each line in the range above. The !d is not new to us, we use it again here to keep the non-blank line in the range and remove the rest

We can also apply the !d action on the range from the first non-blank line to the end of the file to solve the problem:

$ sed '/\S/,$!d' with_blank.txt

5.2. Using awk

First, let’s see how a simple awk solution would look:

$ awk '/\S/{p=1}p' with_blank.txt

The awk one-liner has two parts.

The first part is /\S/{p=1} — if a record is a non-blank line, we set the variable p=1.

The second part is simply p —if the variable p holds a non-zero number, the current line will be printed.

In awk, if a variable is not initialized, its default value is an empty string or 0.

Therefore, the variable p is set from 0 to 1 when the first non-blank line comes, and the value will be kept until the last line of the file.

In this way, the awk command prints from the first non-blank line to the end of the input file.

6. Remove Trailing Blank Lines Only

Usually, text processing tools process a file’s lines in order from the beginning to the end of the file, and it’s not easy to look back at a line we’ve already processed.

Therefore, our main challenge to this problem is to find out the last non-blank line in a file.

The output printed by a working solution looks like:

    
    
    
    
This is the first non-blank line.

Some data comes here

-----

1
  2
    3

-----   
Data End.

This is the last non-blank line.

6.1. Using tac

The tac command is a member of the GNU Coreutils package. It is by pre-installed by default in all Linux distros.

Whereas the cat command prints a file in its natural order, the tac command prints a file in reverse order. (Note that tac is just cat spelled backward!)

An example shows its ability clearly:

$ cat file
1
2
3
4
5

$ tac file
5
4
3
2
1

We can solve our problem of removing trailing blank lines by using the tac command twice:

tac input | <COMMAND TO REMOVE LEADING BLANK LINES> | tac

For example:

$ tac with_blank.txt | sed '/\S/,$!d' | tac

The tac command indeed simplifies the problem. However, we have to start three processes and process the content of the input file three times.

Sometimes this could be a pain — especially when we have to handle huge input files.

6.2. Using sed

The sed solution is not as straightforward as the solution with tac. However, it starts a single process and reads the input file only once:

$ sed ':a; /^[[:space:]]*$/ { $d; N; ba; }' with_blank.txt

Now, let’s first understand the code in the sed one-liner:

:a; – create a label called “a“
/^[[:space:]]*$/ { actions } – if current pattern space matches the blank line pattern, the following actions will be executed
$d; – skip the pattern space only if the current line is the last line in the input file
N; – read the next line from the input file, and append it to the pattern space
ba; – branch to label a

Let’s address how this clever solution works.

Once a blank line is read, this one-liner appends subsequent lines into the pattern space by recursively branching to the label a.

Since the whitespace class [[:space:]] includes line breaks, /^[[:space:]]*$/ matches multiple blank lines.

However, once a non-blank is read and appended into the pattern space, the string in the pattern space doesn’t match the pattern /^[[:space:]]*$/ any longer. So, we break the recursion, print the pattern space, and then clear it.

If the input file has trailing blank lines, all of them will be in the pattern space, and the last line in the file will skip the pattern space by $d. Thus, the consecutive trailing blank lines won’t be in the output.

6.3. Using awk

Using the awk command, there are two ways to identify the last non-blank line of a file.

Let’s take a look at the first approach:

$ awk '{a[NR]=$0; if(/\S/)mark=NR} END{for(i=1;i<=mark;i++)print a[i]}' with_blank.txt

The awk code above reads the input file only once. Let’s break it down:

Read and save each line in an array: a[]
Save the last non-blank line number in a variable called mark
After reading all lines, go through the array once again and print lines until the line we saved in mark

As an alternative, we can also solve the problem by reading the input file twice:

$ awk 'NR==FNR && /\S/{mark=NR; next} FNR<=mark' with_blank.txt with_blank.txt

The first read finds out the line number of the last non-blank line and saves it in a variable called mark.

Then, the second read prints each line if the line number (NR) is less than or equal to mark.

7. Conclusion

Removing blank lines from a file is a common operation when we handle text files in Linux.

In this article, we’ve discussed three different blank-line-removal scenarios:

All blank lines
Leading blank lines only
Trailing blank lines only

These three scenarios will cover most use cases. If we’ve understood the solutions, removing blank lines won’t be a challenge for us.

Persistence

REST

Security