1. Overview
Both the awk command and the sed command are powerful Linux command-line text processing utilities. We know that the sed command has a handy -i option to edit the input file “in-place”. In other words, it can save the changes back to the input file.
In this tutorial, we’ll explore how to do “in-place” editing with the awk command through examples.
2. The Example of an Input File
Before we look at the awk commands, let’s create an input file named scores.txt:
$ cat scores.txt
Kai 77
Eric 97.5
Amanda 97
Jerry 60
Tom 80
As the output above shows, the text file holds the names of students and their scores on an exam.
We want to use the awk command to append the average score at the end of the file.
Next, let’s see how to edit the file in-place.
3. Using the inplace Extension of GNU awk
GNU awk is a widely used awk implementation. Since release 4.1.0, GNU awk ships with the inplace extension to emulate the -i (in-place) option of GNU sed.
3.1. Using the -i inplace Option
The syntax to use the inplace extension of the gawk command is pretty straightforward:
gawk -i inplace '... awk code ...' input_files
The -i option above is for including an awk source library. In this case, we want to include the inplace extension. Therefore, we have the argument: -i inplace.
Now, let’s calculate and append the average score to our scores.txt:
$ gawk -i inplace '{sum+=$2} 1; ENDFILE {printf "----\nAVG: %.2f\n",sum/NR}' scores.txt
$ cat scores.txt
Kai 77
Eric 97.5
Amanda 97
Jerry 60
Tom 80
----
AVG: 82.30
Great! The file has been updated as we expected.
3.2. END vs. ENDFILE
The awk one-liner in the previous section is pretty straightforward. It sums all the scores and calculates the average value at the end.
However, if we read the one-liner carefully, we can notice that we used an ENDFILE block instead of the usual END block.
We might ask, would it be the same if we used the END block? Let’s do a little test to find out.
In the next test, we want to add a header “The beginning of the file” and a footer “The end of the file” to the scores.txt file.
Firstly, let’s output the header and the footer in the BEGIN and END blocks, respectively:
$ gawk -i inplace 'BEGIN { print "The beginning of the file" }
{ print }
END { print "The end of the file" }' scores.txt
The beginning of the file
The end of the file
As the output above shows, the header and the footer are printed. However, the records in the file didn’t show up. Don’t worry, though — we’ve included the inplace extension, so the input file should be updated anyway. Let’s check our input file scores.txt:
$ cat scores.txt
Kai 77
Eric 97.5
Amanda 97
Jerry 60
Tom 80
Oops! The header and the footer are not added to the file! Why has that happened?
To explain why this happened, we need to understand how the inplace extension works.
The inplace extension only makes sense while a file is being processed. However, the BEGIN and the END blocks will run before a file is processed and after a file has been processed completely.
Therefore, the inplace extension won’t work with the changes in the BEGIN and the END block. To solve this problem, we should use two particular patterns of GNU awk: BEGINFILE and ENDFILE:
$ gawk -i inplace 'BEGINFILE { print "The beginning of the file" }
{ print }
ENDFILE { print "The end of the file" }' scores.txt
Even though there’s no output after we execute the command above, the header and the footer have been added to the scores.txt file:
$ cat scores.txt
The beginning of the file
Kai 77
Eric 97.5
Amanda 97
Jerry 60
Tom 80
The end of the file
3.3. Editing Multiple Files In-Place
The inplace extension works with multiple input files as well. Let’s address it through an example.
Firstly, let’s create three small files:
$ head *.txt
==> java.txt <==
Java!
==> kotlin.txt <==
Kotlin!
==> linux.txt <==
Linux!
Secondly, we’ll write a short awk one-liner to change the text in each file:
$ gawk -i inplace '$0 = "I Love " $0' java.txt kotlin.txt linux.txt
Finally, let’s recheck the three files:
$ head *.txt
==> java.txt <==
I Love Java!
==> kotlin.txt <==
I Love Kotlin!
==> linux.txt <==
I Love Linux!
Cool! All files are changed in-place.
In our awk code, we don’t have to handle each file separately or manually control the redirection. Instead, we just change the text as the awk command reads each line from the files. The inplace extension takes care of which file is currently being processed and writes any changes back to the file automatically.
4. Using a Temporary File
If we don’t have GNU awk or our gawk version is lower than 4.1.0, we cannot use the convenient inplace extension. Never mind — we can still save the change back to the input file using a temp file:
awk '... code ...' input_file > tmp_file && mv tmp_file input_file
Since we’ve used the && operator, only if the awk command is executed successfully will the later mv command be executed.
Let’s add the average score to our scores.txt via a temp file:
$ awk '{sum+=$2} 1; END {printf "----\nAVG: %.2f\n",sum/NR}' scores.txt > score.tmp && mv score.tmp scores.txt
$ cat scores.txt
Kai 77
Eric 97.5
Amanda 97
Jerry 60
Tom 80
----
AVG: 82.30
As the example shows, it’s also pretty handy to do an in-place edit using a temp file. However, it’s worthwhile to mention that this approach won’t work if we want to apply in-place editing on multiple input files.
5. Some Common Pitfalls
So far, we’ve seen two different ways to use the awk command to do in-place editing.
Now, we’ll first show a couple of in-place editing methods that aren’t recommended. Then, we’ll understand why they’re not advisable solutions so that we can avoid using them in the real world.
5.1. The echo Command and Command Substitution
Command substitution is a convenient technique in the shell script to get the output of a command. Sometimes, we may see it used with the echo command to redirect the changes back to the input file:
echo "$(awk '... code ...' input_file)" > input_file
This method looks straightforward. It will work properly if the awk command runs well. However, it may destroy our input data if an error occurs in the awk command.
Let’s append the average score to the scores.txt using this method:
$ echo "$(awk '{sum+ =$2} 1; END {printf "----\nAVG: %.2f\n",sum/NR}' scores.txt)" > scores.txt
awk: cmd. line:1: {sum+ =$2} 1; END {printf "----\nAVG: %.2f\n",sum/NR}
awk: cmd. line:1: ^ syntax error
In the example above, we typed a space between the sum+ and the = by mistake. Unsurprisingly, the awk command complained and printed out the error message.
We may want to fix the error and re-run the command. But before we do that, let’s check the input file:
$ cat scores.txt
$
Oops! The data in our input file is gone!
This happened because the failed awk command printed nothing to the stdout. Further, the echo command printed the empty string and redirected it to the input file.
Therefore, we shouldn’t use this way to handle input files.
5.2. The Redirect Pitfall
Using IO redirection can solve many problems neatly. Sometimes, we may see code that attempts to save the change back to the input file by playing with the redirection:
command < input_file > input_file
This method looks smart and compact. However, it won’t work. Moreover, it is dangerous, because if we run a command like this, the input_file is going to be truncated anyway.
This is because, before a command is executed, its input and output may be redirected using a special notation interpreted by the shell. That is to say, the shell performs the redirections before handing control over to the command. Thus, the “*>*” redirection will empty the input_file. So, by the time command gets executed and wants to read from the input_file, the file is already empty.
Let’s make a little test using the cat command:
$ cat < scores.txt > scores.txt
$ wc -c scores.txt
0 scores.txt
6. Conclusion
In this article, we discussed how to do in-place editing with the awk command. We’ve addressed several approaches through examples.
Moreover, we’ve talked about some common pitfalls that we should beware of in practice.