1. Overview
Linux provides a lot of tools that come in handy when working with texts or files. These tools enable users to easily do some processing for an input like searching for a specific pattern, printing specific fields, or replacing some strings.
In this tutorial, we’ll be using one very powerful tool called AWK. We’ll first explain some fundamental usage for AWK, then we’ll demonstrate how to apply these concepts to print the last element of each row in a file.
2. What Is AWK?
AWK is a command-line utility for text processing and data extraction. It’s also considered a scripting language, as it provides additional functionality like loops, conditions, and variables. AWK works on an input stream that might come from a file, standard input, or piping from another command. We can then manipulate this input and output the data in a specific format or search through it for a specific pattern.
One of the major advantages of AWK is it combines ease and efficiency. So, one can learn the basics of AWK and start working with it very quickly, and at the same time, it provides a simple way for most common text and pattern processing needs.
Most Linux distributions have AWK installed by default. To make sure we have AWK on our system, we can simply type awk in our terminal:
$ awk
Usage: mawk [Options] [Program] [file ...]
If we have Usage and Options in the output, this means that we have awk on our machine.
3. Understanding Records and Fields
As we mentioned, AWK receives an input stream of text and starts working on it. Instead of operating on this input as a whole, AWK views its content as a sequence of records and each record, in turn, as a sequence of fields.
To identify and divide the text into records and fields, AWK uses what are called separators. A separator is a specific character that marks the start or end of a new record or field. In other words, AWK scans through the text content, and once it finds a separator character, it classifies this part of the text as a record or field.
The default separator for records is the newline character, and the default separator for fields is the whitespace character. This means that AWK considers each line as a record, and each column in the line as a field.
Let’s check this with an example:
$ cat students.txt
name id grade
John 100 third
Mark 200 fourth
Jennifer 300 fifth
Here, we used the cat command to view a file called students.txt. Inside this file, we have some data structured into rows and columns. According to the default behavior of AWK, each line in this file is a record, and each column is a field. So, here we have four records and three fields.
4. Print Specific Fields in a Record With AWK
The general format of using AWK is:
$ awk 'pattern { action }'
The pattern is an expression that evaluates into a TRUE or FALSE. When it evaluates to TRUE, the action is executed by AWK, and when it’s FALSE, the action is ignored. If we don’t have a pattern, this defaults to TRUE, and the action is automatically executed.
Now, let’s demonstrate this with a simple example:
$ awk '{print $1}' students.txt
name
John
Mark
Jennifer
So, here we used our students.txt file as an input to the awk command. We specified an action of print, which means we need AWK to output some fields from this input file. Since we didn’t specify a pattern, this means that the action will execute anyway as the pattern defaults to TRUE.
The dollar sign $ character tells AWK which field we need to apply the action on. The fields are numbered in each record starting from 1. Thus, in the above command, AWK will print the first field of each record, and that’s what we can see in the output.
5. Using the NF Variable in AWK
AWK provides some built-in variables that we can often use in our scripts. One variable that might be of specific interest and very commonly used is the NF variable. The NF variable evaluates to the number of fields in a record. As AWK moves through each line in the input, the NF variable will represent the number of fields in this specific line.
Let’s check this using our students.txt file:
$ awk '{print NF}' students.txt
3
3
3
3
Here, we used the print action again, but this time, we want to output the NF variable, which is the number of fields. So, AWK scans each line and its fields, then it prints their count.
Let’s modify some lines in our file and run our one-liner again:
$ cat students.txt
name id grade 1 2
John 100 third 3
Mark 200 fourth 4 5 6
Jennifer 300 fifth
We just added some random numbers to modify the number of fields in some lines. Let’s now print the NF variable again:
$ awk '{print NF}' students.txt
5
4
6
3
We can see that the output has changed to match the number of fields we modified.
6. Print the Last Element of Each Row in a File
Using AWK with the previous options we’ve learned allows us to print the last element of each row in a file with a simple trick. Instead of specifying the field number statically after the dollar sign character, we can use the NF variable that will dynamically evaluate to the number of fields in each record.
Let’s revert our students.txt file again to be more readable and check this with an example:
$ cat students.txt
name id grade
John 100 third
Mark 200 fourth
Jennifer 300 fifth
$ awk '{print $NF}' students.txt
grade
third
fourth
fifth
In this example, we’ve used the NF variable in our print action to specify the last element in each line. AWK will evaluate the NF variable to the number of fields and will print the output.
Following the same method, we can specify a different position for our element to print:
$ awk '{print $(NF-1)}' students.txt
id
100
200
300
Here, we used NF-1 as our element position. Once again, AWK will evaluate this to the field before the last one and print the output.
7. Conclusion
In this article, we’ve covered the basics of using AWK for processing some text. AWK offers a lot of easy-to-use options for common scenarios of manipulating and searching for text patterns. AWK receives an input stream and divides it into records and fields. This makes it easy to manipulate specific parts of the text.
AWK also offers some built-in variables that we can automatically use. One commonly used AWK variable is NF, which evaluates to the number of fields in a record. We can use the NF variable to select the last element in each record in an input, and then we can print or apply a specific action to this element.