1. Overview
AWK is universally used as a data extraction and reporting utility in Linux systems. In this tutorial, we’ll learn how to write dynamic awk scripts by passing parameters.
2. Scenario Setup
Before advancing to learn how to pass parameters to an awk script, we need to create some simulation data. For this, let’s create the employees.db file that contains comma-separated values:
$ cat employees.db
Name,Salary
Alice,25000
Alex,35000
Raymond,15000
Leo,7900
We can notice that the first record contains the field names — namely, Name and Salary.
Next, we must understand that the awk scripts differ from Bash scripts regarding passing parameters. Unlike Bash scripts that use $1, $2, and so on as positional arguments, awk scripts will interpret these as built-in field variables.
Further, let’s see this in action with the help of a one-line awk command based on the pattern-action paradigm:
$ awk -F',' 'NR>1 {print $1,$2}' employees.db
Alice 25000
Alex 35000
Raymond 15000
Leo 7900
We can see that using $1 and $2 variables helped us display the employee’s name and salary fields in a space-separated format. Moreover, we could show records from the second line onward using the built-in variable NR, which denotes the current record number.
In the following sections, we’ll use the employees.db file as input data to learn how to pass parameters to an awk script.
3. Using Command-Line Named Arguments
Let’s say we want to write an awk script to show the names and salaries of employees in a space-separated format. Additionally, we want our script to accept a parameter to exclude an employee by name from the original list.
First, let’s look at the awk usage with the -v option to pass parameters as variables:
$ awk -v <variable_name>=<value> -f <awk_script> <file>
Next, let’s assume we’ll pass a variable named exclude specifying the employee name we need to exclude from the report. Based on this assumption, let’s write the filter_employees.awk script:
$ cat filter_employees.awk
BEGIN {
FS=","
}
NR>1 {
if ($1 != exclude) {
print $1,$2
}
}
Notice that we’ve added a condition that matches the first field ($1) against the value in the exclude variable.
Finally, let’s execute the script and see it in action:
$ awk -v exclude="Leo" -f filter_employees.awk employees.db
Alice 25000
Alex 35000
Raymond 15000
$ awk -v exclude="Alice" -f filter_employees.awk employees.db
Alex 35000
Raymond 15000
Leo 7900
It looks like we’ve got this right, as the output doesn’t contain the employee whose name is specified with the exclude variable while executing the script.
4. Using Command-Line Positional Arguments
Let’s say we get a new requirement to extend our awk script to exclude multiple employees. Earlier, we excluded a single employee using a named command-line argument, but that approach won’t work here. So, in this section, we’ll solve this use case using the built-in variable ARGC and positional argument array ARGV.
Firstly, let’s write a one-line awk command to understand the meaning of ARGC and ARGV variables:
$ awk -e 'BEGIN{ for(i=0;i<ARGC;i++) print "ARGV["i"]="ARGV[i]}' \
exclude_1=Leo exclude_2=Alice exclude_3=Raymond \
employees.db
ARGV[0]=awk
ARGV[1]=exclude_1=Leo
ARGV[2]=exclude_2=Alice
ARGV[3]=exclude_3=Raymond
ARGV[4]=employees.db
We must notice that there is only a BEGIN block in our awk program that iterates over the ARGV array within the bounds defined by the ARGC variable. While ARGC stores the total count of arguments, ARGV stores the actual values.
Furthermore, we can see that the awk* program and the options such as -e are excluded from *ARGV. As a result, “awk” goes in ARGV[0], the exclude-specific parameters go in ARGV[1], ARGV[2], and ARGV[3], while the filename goes in ARGV[4].
Next, let’s write a new script called filter_employees_v2.awk to exclude multiple employees by extending the filter_employees.awk script:
$ cat filter_employees_v2.awk
BEGIN {
FS=","
exclude_index=0
for (i=1; i < ARGC; i++) {
if(ARGV[i] ~ /exclude_[0-9]*=.*/) {
split(ARGV[i], excludeArr, "=")
EXCLUDE[++exclude_index]=excludeArr[2]
}
}
}
NR>1 {
show="true"
for (i in EXCLUDE) {
exclude=EXCLUDE[i]
if($1 == exclude) {
show="false"
}
}
if(show == "true") {
print $1,$2
}
}
We must notice that we used the BEGIN block to aggregate all the exclude-specific parameters into the EXCLUDE array. Additionally, we modified the main block to check if the employee name referred by $1 matches against any of the values in EXCLUDE.
Finally, let’s run the filter_employees_v2.awk script by passing multiple employee names for exclusion:
$ awk -f filter_employees_v2.awk exclude_1=Leo exclude_2=Alice exclude_3=Raymond employees.db
Alex 35000
Perfect! The result meets our expectations.
5. Using Environment Variables
Yet another way to pass parameters to an awk script is through environment variables.
First, let’s write a one-line awk command to understand how to pass and access the environment variables:
$ export EXCLUDE_EMPLOYEES="Leo,Alice" && awk -e 'BEGIN {print ENVIRON["EXCLUDE_EMPLOYEES"]}' employees.db
Leo,Alice
We must note that we need to export the EXCLUDE_EMPLOYEES variable so that awk can access it. Additionally, we access it through the associative array called ENVIRON.
Next, let’s write a new awk script named filter_employees_v3.awk to accept an environment variable called EXCLUDE_EMPLOYEES containing comma-separated names:
$ cat filter_employees_v3.awk
BEGIN {
FS=","
split(ENVIRON["EXCLUDE_EMPLOYEES"], EXCLUDE, ",")
}
NR>1 {
show="true"
for (i in EXCLUDE) {
exclude=EXCLUDE[i]
if($1 == exclude) {
show="false"
}
}
if(show == "true") {
print $1,$2
}
}
Notice that we used the BEGIN block to populate the EXCLUDE array by splitting the comma-separated values available through ENVIRON[“EXCLUDE_EMPLOYEES”]. Moreover, the main block remains unchanged.
Finally, let’s execute the filter_employees_v3.awk script by passing multiple employee names for exclusion:
$ export EXCLUDE_EMPLOYEES="Leo,Alice" && awk -f filter_employees_v3.awk employees.db
Alex 35000
Raymond 15000
Great! It looks correct.
6. Conclusion
In this tutorial, we learned how to make dynamic awk scripts by passing parameters to them. We also learned about a few options, such as -v, and some built-in variables, such as ARGC, ARGV, and ENVIRON, that are available in AWK.