1. Overview
The awk command is a powerful command line text-processing tool in Linux.
In this tutorial, we’ll explore how to make awk output multiple records in the same line.
2. Introduction to the Problem
First of all, let’s see how awk outputs records by default. Let’s say we have an input file:
$ cat input.txt
1 Kotlin is sexy!
2 Java is a powerful programming language!
3 Rust is very fast!
4 Python is awesome!
As the example above shows, the input.txt file contains a few lines. Also, the second word in each line is a programming language name. Now, let’s extract and print the language names using the awk command.
We can treat the text as space-separated values. So, a compact one-liner using awk‘s default FS can take the second fields:
$ awk '{ print $2 }' input.txt
Kotlin
Java
Rust
Python
We can see the awk command above does the job. However, language names sit in different lines in the output.
Let’s say we would like to have all programming language names in the same line and separated by a separator, for example, a comma and a space ‘, ‘. Then the expected output looks like this:
Kotlin, Java, Rust, Python
This looks like an easy task. But there are a few aspects we need to pay attention to if we want to obtain the exact expected output.
So next, let’s see how to achieve that.
3. Concatenating Every Record and print in the END Block
First, let’s understand why each ‘print $2’ statement will print the second field in a new line.
The awk command’s print statement prints the given value followed by an output record separator (ORS). Further, the default ORS‘s value is a newline character. Therefore, every time we execute print x, a newline character comes in the output.
Now that we understand where the newline character comes from, one idea to solve the problem may come up: perform the print statement only once. That is to say, we concatenate the values we want to print by the predefined separator and execute the print statement in the END{ } block.
Next, let’s translate the logic into an awk command:
$ awk '{ result = result ( NR==1 ? "" : ", ") $2 } END{ print result }' input.txt
Kotlin, Java, Rust, Python
As the output above shows, it does the job. But this approach first gathers all required values in the memory. We must build a very long string if we apply this command to a huge file. Different awk implementations can have different string length limits. If the input file is large enough, our solution may fail and encounter the “string too long” error.
So next, let’s see if we can output each record immediately without building the long string.
4. Setting the ORS Variable
We’ve understood that each print statement outputs values followed by a newline character because the ORS‘s default value is “\n“. It’s worth mentioning that ORS is an awk built-in variable. Further, like FS, we can assign a different value to the ORS variable to change the print statement’s behavior.
Next, let’s set ORS=”, “, and see if our initial command works:
$ awk -v ORS=", " '{ print $2 }' input.txt
Kotlin, Java, Rust, Python,
As we can see in the output, all language names are in the same line now. However, there are two differences compared to the expected result.
If we check the output carefully, we see the output has a trailing separator. Furthermore, another difference we cannot directly see in the output above is that the output has no trailing newline character.
Usually, most commands’ output ends with a newline character in Linux, for example:
$ awk 'BEGIN{ print "I am ending with a newline char" }' && echo "This is from echo."
I am ending with a newline char
This is from echo.
“This is from echo” is printed by the echo command in the output above. Also, it is printed in a new line since the awk outputs a trailing newline character.
We can use the same method to verify our awk command:
$ awk -v ORS=", " '{ print $2 }' input.txt && echo "This is from echo."
Kotlin, Java, Rust, Python, This is from echo.
This time it’s clear, apart from the trailing separator “*,* “, awk‘s output and echo‘s output are printed in the same line. So next, let’s see how to remove the trailing separator and add the ending “\n“.
First, awk doesn’t output the ending newline character. This is because we’ve overwritten the default ORS with “*,* “.
For example, we can restore the ending “\n” by outputting a newline character in the END{ } block. However, removing the trailing separator is not easy as it comes with the print statement.
Therefore, if we want the exact expected output, this approach needs support from other commands. For instance, we can use a sed command to “post-process” awk‘s result by removing the trailing separator and adding a newline character:
$ awk -v ORS=", " '{ print $2 }' input.txt | sed 's/, $/\n/' && echo "This is from echo."
Kotlin, Java, Rust, Python
This is from echo.
Therefore, the setting ORS approach looks pretty straightforward, but its output may contain some differences. Particularly, if we choose space as the field separator, the trailing space is not easily detected. When we pass awk‘s output to another program without realizing that, our whole process may produce unexpected results.
5. Using the printf Statement
Like in many programming languages, printf allows us to control the output format flexibly. So, for example, we can use the printf statement to solve the problem:
$ awk '{ printf "%s%s", NR==1 ? "" :", ", $2}' input.txt
Kotlin, Java, Rust, Python
As we can see, the output above looks good. However, we should note that, unlike print, printf doesn’t output ORS automatically. Therefore, the output above doesn’t have an ending newline character. We can verify it using the old trick:
$ awk '{ printf "%s%s", NR==1? "" : ", ", $2}' input.txt && echo "This is from echo."
Kotlin, Java, Rust, PythonThis is from echo.
Fixing it is not difficult. We can print the default ORS (“\n“) in the END{ } block:
$ awk '{ printf "%s%s", NR==1? "" : ", ", $2} END{ print "" }' input.txt && echo "This is from echo."
Kotlin, Java, Rust, Python
This is from echo.
As we can see, compared to the other two approaches, using printf is the preferable solution to this problem.
In practice, we often see code that uses printf on variables without the format string, like printf someVariable, instead of printf “%s”, someVariable. Many even consider it a shortcut of print without the newline character. For example, we can rewrite our awk solution in this way:
$ awk '{ printf (NR==1? "" : ", ") $2} END{ print "" }' input.txt
Kotlin, Java, Rust, Python
As the command shows, it works too. However, this is considered a bad practice. We should never apply printf to variables directly without the format string. This is because we cannot predict what input our command may receive. An example can address that quickly.
Let’s add one line to our input.txt:
$ cat input.txt
0 %scala is nice!
1 Kotlin is sexy!
2 Java is a powerful programming language!
3 Rust is very fast!
4 Python is awesome!
As the cat output shows, we’ve added a new line to the top of the file. Also, the second field contains ‘*%*‘, which is coincidently the printf‘s format placeholder. Then our command fails:
$ awk '{ printf (NR==1? "" : ", ") $2} END{ print "" }' input.txt
awk: cmd. line:1: (FILENAME=input.txt FNR=1) fatal: not enough arguments to satisfy format string
`%scala'
^ ran out for this one
However, the printf with the format string version still works:
$ awk '{ printf "%s%s", (NR==1? "" : ", "), $2} END{ print "" }' input.txt
%scala, Kotlin, Java, Rust, Python
6. Conclusion
In this article, we’ve learned how to ask awk to print records in the same line. We’ve discussed three approaches through examples.
printf allows us to control the output format flexibly. Therefore, it would be the preferable solution to this problem.