如何在AWK脚本中使用Shell变量

1. Introduction

The standardized AWK programming language as implemented by awk, has been a staple in UNIX and Linux systems. In fact, because of how ubiquitous and reliable it has become through the years, many shell scripts use it. However, interoperability with some shell functions can pose challenges.

In this tutorial, we explore ways to use shell variables within an AWK script. First, we look at embedding with the use of quotes. Next, we directly pass predefined variables via two AWK mechanisms. After that, we turn to command-line arguments. Finally, we use a special internal AWK variable to access the shell environment.

Importantly, none of the methods we look at allows for the direct use or assignment of shell variables in AWK. They are just ways to transfer values safely from one context to the other.

We tested the code in this tutorial on Debian 11 (Bullseye) with GNU Bash 5.1.4. It should work in most POSIX-compliant environments.

2. Embed Shell Variable

Naturally, one way to use the value of a shell variable in AWK is to directly interpolate it within the context of a string, i.e., the script:

$ var='data'
$ awk 'BEGIN { print "shell_var='"$var"'" }'
shell_var=data

Here, we glue quotes appropriately to end up with the desired value. Of course, this can lead to many issues related to character escaping.

Critically, embedding doesn’t work with AWK script files out of the box. We’d have to perform complex and potentially dangerous replacements within the file beforehand.

Finally, allowing arbitrary user data in a script presents a way to inject code, which can be dangerous, similar to eval.

3. Direct Value Input

Many shells provide a here-string mechanism for supplying data to commands:

$ cat <<< 'data'
data

If we apply this to an AWK segment, input fields can separate out the data as usual:

$ var1='data1'
$ var2='data2'
$ awk '{ print "shell_var1=" $1 "\n" "shell_var2=" $2 }' <<< "$var1 $var2"
shell_var1=data1
shell_var2=data2

Of course, we can also assign the acquired values to internal variables with the same or a different name:

$ var='data'
$ awk '{ var=$1; print "var=" var }' <<< "$var"
var=data

Whether with a script or a script file, this method presents several challenges:

variable values are embedded in the regular data stream, meaning they have to be separated
depending on the FS field separator and the contents of the variables, the extracted data may be corrupt or incomplete
proper quoting should still be used in all cases, which may lead to complexity

To address the first potential problem, we can have the starting line consist of only variable data and process it accordingly:

$ vars='5 3'
$ rows=$'row2\nrow3'
$ input="$vars"$'\n'"$rows"
$ awk 'NR==1 { var1=$1; var2=$2 } NR>1 { print $0, NR "*" var1 "=" NR*var1 }' <<< "$input"
row2 2*5=10
row3 3*5=15

Initially, we set the variable values row. Next, we use an ANSI string to include newlines in the row data. We combine both to get the final $input.

Then, we supply the properly-quoted input to awk, where we perform different operations based on the row number NR. For the top line, we preserve the first field in var1 and the second – in var2. This way, we can use the variables in operations for any following row.

Yet, all of this still leaves us with problems with variables that contain whitespace characters, especially newlines. In the latter case, even if we change the FS field separator, we’d still have to allocate more rows by changing the NR conditions accordingly.

Due to the above, this method is rigid and with a high chance of problems.

4. Predefine Variables

AWK has a couple of options to assign values to internal variables before a script is executed. Notably, the shell and internal variable names can differ.

If the variable values contain an escape sequence, AWK interprets it:

$ awk -v var='\tval\nue' 'BEGIN { print "var=" var }'
var=       val
ue

Regular expression characters like the | pipe symbol and similar meta characters have to be escaped twice, e.g., \\|.

Let’s explore both options.

4.1. Using -v

The -v flag of awk precedes a space, a variable name, an = equals sign, and the variable value. Importantly, the latter can be an interpolated shell value:

$ var='data'
$ awk -v var="$var" 'BEGIN { print "shell_var=" var }'
shell_var=data

Each variable can then be used within the script as usual.

For multiple variables, we supply -v as many times as we need:

$ var1='data1'
$ var2='data2'
$ awk -v var1="$var1" -v var2="$var2" 'BEGIN { print "var1=" var1 "\n" "var2=" var2 }'
var1=data1
var2=data2

Next, we discuss a related but separate method to pass shell values to AWK variables.

4.2. Direct Variables

Similar to the previous option, we can specify variables and their values directly after the script or script file:

$ var=5
$ echo $'row1\nrow2' | awk 'BEGIN { print var } { print $0, NR "*" var "=" NR*var }' var="$var"

row1 1*5=5
row2 2*5=10

Critically, the first output line is empty. This results from the fact that predeclaring variables this way makes them unavailable in the BEGIN block. Thus, var is undefined in the first print but outputs correctly afterward.

As long as we don’t need the values in a BEGIN block, this method is equivalent to using -v.

5. Using ARGV

As usual, we can pass command-line arguments initialized with interpolated shell variables:

$ var1='data1'
$ var2='data2'
$ awk 'BEGIN { print "var1=" ARGV[1] "\n" "var2=" ARGV[2] }' "$var1" "$var2"
shell_var1=data1
shell_var2=data2

To access command-line arguments in AWK, we use the special ARGV array variable. After index 0, which is the awk interpreter name, the array holds each successive argument value in order.

This is a robust way of capturing any variable value without worrying about escaping.

Similarly, we can use another special variable to access the shell environment values directly.

6. Using ENVIRON

The ENVIRON special variable provides access to all shell variables in an array with string indices:

$ export var='data'
$ awk 'BEGIN { print ENVIRON["var"] }'

Still, the one minor drawback is the fact that we have to export the variable:

$ export var1='data1'
$ export var2='data2'
$ var3='data3'
$ awk 'BEGIN { print "shell_var1=" ENVIRON["var1"] "\n" "shell_var2=" ENVIRON["var2"] "\n" "shell_var3=" ENVIRON["var3"] }'
shell_var1=data1
shell_var2=data2
shell_var3=

Consequently, the value of the non-exported $var3 variable can’t be extracted via ENVIRON.

This is perhaps the most optimal and safe way to consistently pass data between the shell and AWK.

7. Summary

In this article, we looked at ways to pass values of shell variables to AWK for internal use.

In conclusion, while there are multiple methods, only a couple are actually viable in most situations due to their safety and simplicity.

Persistence

REST

Security