1. Introduction
The AWK programming language, implemented by the awk interpreter, is a versatile way to parse and manipulate data. In particular, we can integrate awk commands within shell scripts, but we can also use shell features like variables within AWK scripts.
In this tutorial, we’ll talk about ways to employ shell variables as one of the main parts of an AWK script or one-liner. First, we explore the general mechanism that awk follows when processing data. After that, we delve into the main subject of employing shell variables as patterns. Finally, we discuss potential pitfalls.
We tested the code in this tutorial on Debian 12 (Bookworm) with GNU Bash 5.1.4. It should work in most POSIX-compliant environments unless otherwise specified.
2. How AWK Processes Data
In short, awk statements comprise two parts:
- pattern – expression to match a record
- action – instructions to run for matched lines
By default, records are lines, so pattern works per line. In fact, pattern can also be a kind of condition or even a general expression.
Let’s see a basic example of a pattern:
$ printf 'Line 1.\nLine 2.\nLine 3.' | awk '/[23]/'
Line 2.
Line 3.
Here, we pipe the three-line output from printf to awk. Within AWK, we use a basic regular expression (regex) character group as the pattern between // slashes. Since we only match lines that contain either 2 or 3, the final output excludes Line 1.
In fact, the reason that we see any output at all is the fact that the AWK print statement is the implicit action for all patterns that return true but do not include explicit actions. Because of this, we can rewrite the above example:
$ printf 'Line 1.\nLine 2.\nLine 3.' | awk '/[23]/ { print; }'
Line 2.
Line 3.
In this case, the action is the print within braces. For our purposes, we mostly skip the explicit statement and stick with the first example above.
Now, let’s see how we can construct a pattern that includes shell variables.
3. Using Shell Variable as Pattern
While we can pass parameters to awk, only some ways to use shell variables work with patterns.
3.1. Embedding
As usual, we can employ quotes in a specific manner to ensure a given shell variable is interpreted directly within the text of an AWK script:
$ CHARS=23
$ printf 'Line 1.\nLine 2.\nLine 3.' | awk '/['"$CHARS"']/'
Line 2.
Line 3.
In this case, to achieve the results from earlier, we first define the $CHARS shell variable with the value 23. After that, we terminate the single quotes that surround the AWK expression to insert an interpolation of the external $CHARS variable at the given location.
Moreover, we can replace the whole pattern this way:
$ PATTERN=[23]
$ printf 'Line 1.\nLine 2.\nLine 3.' | awk '/'"$PATTERN"'/ { print; }'
Line 2.
Line 3.
Since embedding the variable is risky, let’s explore other ways as well.
3.2. Internal Variable
There are several methods to assign an external variable value to an internal variable:
- embedding: break the quote around the script, insert an interpolated shell variable, and resume the quote
- direct input: redirect via here-string or similar to pass shell variable values directly with the regular data
- ARGV: provide shell variable values as arguments to the AWK script
- ENVIRON: directly access exported shell variables within AWK
- predefine: leverage switches and options of awk to make shell variable values available
Whichever method we choose, after performing the assignment, we can employ any internal variable as a pattern.
Let’s use predefined variables:
$ PATTERN=[23]
$ printf 'Line 1.\nLine 2.\nLine 3.' | awk -v pat=$PATTERN '$0 ~ pat'
Line 2.
Line 3.
Here, we add three concepts to our earlier examples:
- predefining a variable with -v so it’s available within the AWK script
- matching via the ~ tilde operator
- employing the otherwise-implicit $0 current record variable
In other words, we perform the same matching but with ~ tilde against $0 instead of just // forward slashes and with the regular expression stored in the internal pat variable, initialized via the external $PATTERN shell variable.
Importantly, there are methods to pass external shell variable values that have limitations like lack of access within the BEGIN block and similar.
4. Pitfalls
Naturally, since patterns are often regular expressions, any value we use as a pattern should have the proper syntax and escaping.
This is especially important when it comes to embedded values:
$ PATTERN='2\'
$ printf 'Line 1.\nLine 2.\nLine 3.' | awk '/'"$PATTERN"'/'
awk: line 1: regular expression compile failed (missing operand)
2|
Here, the | pipe symbol trips up the interpreter. To avoid such situations, we might need to sanitize user input or generated variables to comply with the regex syntax rules.
In particular, we can either complete the regular expression or escape specific characters and sequences with a backslash:
$ PATTERN='2|3'
$ printf 'Line 1.\nLine 2.\nLine 3.' | awk '/'"$PATTERN"'/'
Line 2.
Line 3.
$ PATTERN='2\|'
$ printf 'Line 1.\nLine 2.\nLine 3.' | awk '/'"$PATTERN"'/'
$
Of course, the second regex doesn’t return a value since 2| aren’t together anywhere in the data.
Generally, accounting for variable interpolation and potential conflicts with special regex patterns is often vital.
5. Conclusion
In this article, we talked about using shell variables within and as AWK patterns.
In conclusion, due to the flexibility of most shells and the syntax of awk, we can choose one of several ways to place a shell variable value directly within or as a pattern.