1. Overview
AWK is a powerful tool for manipulating text and data on the command line. However, printing special characters like quotes in AWK can be a bit tricky.
In this tutorial, we’ll explore how to print quote characters in AWK.
2. Introduction to the Problem
In AWK, there are typically two quote characters: single (‘) and double (“). The double quote (“) is crucial for denoting string literals and essential for printing text with quotes, while the single quote (‘) marks the boundaries of an AWK program.
Directly printing these quote characters within an AWK program can lead to unexpected results or errors:
$ awk 'BEGIN{print "}'
awk: cmd. line:1: BEGIN{print "}
awk: cmd. line:1: ^ unterminated string
$ awk 'BEGIN{print "'"}'
> (Double quote not closed)
$
However, it’s worth mentioning that we can directly print single quotes in an external AWK program file, for example:
$ cat single_quote.awk
BEGIN{ print "'" }
$ awk -f single_quote.awk
'
Since we often pass a single-quote-marked AWK program directly to the awk command, we’ll focus on this use case in this tutorial, and address the right ways to print quote characters in AWK.
As usual, an example may help us understand the problem. Let’s say we have a single-line, space-separated input:
a b c d e
Now, we aim to transform this input into a multi-line result. Further, we need to wrap odd lines in double quotes and even lines in single quotes. The expected result looks like:
"a"
'b'
"c"
'd'
"e"
Next, let’s figure out how to solve the problem.
3. Using Escape Sequences
In programming, escaping is a common way to treat characters with special meanings as literal ones. In Bash, we escape double- and single-quote characters differently:
- Escaping a double quote by a slash – \”
- Escaping a single quote – ‘\”
The above approaches work in AWK scripts as well. Next, let’s solve the problem by escaping quote characters:
$ awk '{
for(i=1; i<=NF; i++){
if(i % 2)
print "\"" $i "\""
else
print "'\''" $i "'\''"
}
}' <<< "a b c d e"
"a"
'b'
"c"
'd'
"e"
In the code, we feed the AWK script by a herestring. The for-loop and the if-else implementations are pretty straightforward. Also, as the output shows, we get the expected result.
However, the disadvantage of escaping is obvious: the code isn’t easy to read, and it can be error-prone when writing multiple escapes.
So next, let’s see if we can print quotes in AWK more clearly.
4. Using the Hexadecimal and Octal Escape Sequences
The ASCII code of a character can be written in three formats: decimal, hexadecimal, and octal. For instance, the ASCII code of the character ‘A’ can be represented as:
- Octal – 101 (escape sequence: \101)
- Hexadecimal – 41 (escape sequence: \x41)
- Decimal – 65
When we print an octal or hexadecimal escape sequence in AWK, it outputs the corresponding character represented by that sequence, for example:
$ awk 'BEGIN{print "\101 and \x41"}'
A and A
We can obtain the complete ASCII codes in these three formats by executing man ascii:
$ man ascii
...
The octal set:
...
... 042 " ... 047 '
...
The hexadecimal set:
...
... 22 " ... 27 '
...
We can find the quote characters’ octal and hexadecimal code values in the above man page:
- The single quote – Octal value: 047 (\047), hexadecimal value: 27 (\x27)
- The double quote – Octal value: 042 (\042), hexadecimal value: 22 (\x22)
Therefore, we can use octal or hexadecimal escape sequence to print quote characters:
$ awk '{
for(i=1; i<=NF; i++){
if(i % 2)
print "\042" $i "\042"
else
print "\047" $i "\047"
}
}' <<< "a b c d e"
"a"
'b'
"c"
'd'
"e"
$ awk '{
for(i=1; i<=NF; i++){
if(i % 2)
print "\x22" $i "\x22"
else
print "\x27" $i "\x27"
}
}' <<< "a b c d e"
"a"
'b'
"c"
'd'
"e"
Using octal or hexadecimal escape sequences instead of escaping quote characters tends to enhance code readability slightly. However, it’s essential to recall the octal or hexadecimal codes for quote characters. Otherwise, the meaning of these escape sequences might become unclear.
5. Passing Quotes as Parameters
Additionally, we can store quote characters in parameters and pass the parameters to awk using the -v option:
$ awk -v sq="'" -v dq='"' '{
for(i=1; i<=NF; i++){
if(i % 2)
print dq $i dq
else
print sq $i sq
}
}' <<< "a b c d e"
"a"
'b'
"c"
'd'
"e"
This approach eliminates the need for escape characters within the code. Moreover, by using meaningful parameter names, it becomes straightforward to discern which parameter represents which quote character.
6. Conclusion
In this article, we’ve explored various methods to print quote characters in AWK. It’s important to note that these techniques aren’t limited to printing quotes alone. They can also be employed to output other special characters, such as ‘\’.