1. Overview
When we work with the Linux command line, it is a common operation to join multiple lines of input into a single line. Sometimes, we want to add customized delimiters to the merged line, too.
In this tutorial, we’ll take a look at several ways to do this.
2. The Problems
Let’s say we have a plain text input file:
$ cat input.txt
I came
I saw
I conquered!
The file has three lines, and there’s whitespace in each line.
And there are different ways we might like to join them:
- Without a delimiter: I cameI sawI conquered!
- With a delimiter of a single character (‘,’): I came,I saw,I conquered!
- With a delimiter of multiple characters (‘; ‘): I came; I saw; I conquered!
In this tutorial, we’ll attempt to address these with:
3. Pure Bash
Bash is the default shell in most modern Linux distros, and a Bash solution is not dependent on other utilities since it uses only built-in commands.
3.1. Join Without a Delimiter and With a Single Character Delimiter
A short Bash one-liner can join lines without a delimiter:
$ (readarray -t ARRAY < input.txt; IFS=''; echo "${ARRAY[*]}")
I cameI sawI conquered!
If we use the same script but assign a single character ‘*,*‘ to the IFS variable, the second problem gets solved as well:
$ (readarray -t ARRAY < input.txt; IFS=','; echo "${ARRAY[*]}")
I came,I saw,I conquered!
Now, let’s understand how the script works. The one-liner above has three building blocks, we’ll go through each of them:
readarray -t ARRAY < input.txt
The readarray is a Bash built-in command. It was introduced in Bash ver.4. The readarray reads lines from the standard input into an array variable: ARRAY. The -t option will remove the trailing newlines from each line. After that, we have a variable ARRAY containing three elements.
Since our input data are in the input.txt file, we should redirect the file to the standard input using < input.txt.
IFS=''
The IFS is a special shell variable and its name means Internal Field Separator. The default value of IFS is a space, a tab, and a newline.
Here, we assigned the IFS with a single character, empty or ‘,’ depends on our requirements.
echo "${ARRAY[*]}"
${ARRAY[*]}* means all elements of the array variable *ARRAY. With the echo command, all elements of ARRAY will be printed out, separated by the IFS variable. In other words, we get our required output.
There are still a couple of things we should notice.
We put all commands in parentheses*.* This is because (…commands...) executes the commands in a subshell so that the IFS variable in the current shell won’t get inferred.
Both ${ARRAY[*]} and ${ARRAY[@]} indicate all elements of an array. The difference between them is subtle: ${ARRAY[*]} creates one argument, while $ARRAY[@] will expand into separated arguments. The IFS variable takes effect only on the first one
3.2. Join With a Multiple Character Delimiter
Using the IFS variable to control the array output is convenient. However, this way won’t work if we want to separate the elements by a delimiter of multiple characters.
There are several ways to solve the problem. Since we’ve already had an array variable, let’s use it again:
$ readarray -t ARRAY < input.txt; printf -v TXT "%s; " "${ARRAY[@]}"; echo ${TXT%; }
I came; I saw; I conquered!
Let’s take a closer look at the command and understand how it works.
printf -v TXT "%s; " "${ARRAY[@]}"
After we got the ARRAY variable by the readarray command, we used the built-in printf command with the -v var option to save the formatted string in the variable $TXT.
This time, *we used ${ARRAY[@]} instead of ${ARRAY[*]}, because we want to have multiple arguments and pass each to the printf command.*
Then the $TXT has the value: “I came; I saw; I conquered!; “. The only task left is to remove the trailing delimiter “*;* “.
echo ${TXT%; }
${var%substring}* is a string manipulation trick. It deletes the shortest match of $substring from the back of *$var. So ${TXT%; } will remove the trailing “*;* “.
Thus, we got the required output.
4. The tr Command
We can use the tr command to delete specific characters or translate characters from standard input (stdin).
Since the tr command only reads from stdin, when we want to use tr to handle a file, we should redirect the file to stdin.
4.1. Join Without a Delimiter
The tr command can solve this problem in a pretty straightforward way. If we remove all linebreaks from the file content, all lines will be joined together:
$ tr -d '\n' < input.txt
I cameI sawI conquered!
4.2. Join With a Single Character Delimiter
We might think that the problem could also be easily solved if we convert all linebreaks into commas “*,*“. Let’s give it a try:
$ tr '\n' ',' < input.txt
I came,I saw,I conquered!,
Oops! There is a trailing comma in the output above. This is because the last line in the file is ended with a newline. Unfortunately, the tr command cannot remove the trailing comma.
That is, the tr utility cannot solve this problem alone. We need the help of some other utility to solve it.
For instance, we can pipe the output from the tr command to a sed command to change the trailing comma into a newline:
$ tr '\n' ',' < input.txt | sed 's/,$/\n/'
I came,I saw,I conquered!
4.3. Join With a Multiple Character Delimiter
The tr command cannot translate a single character into multiple characters, therefore, it cannot join lines with a delimiter of multiple characters.
5. The paste Command
The paste utility is a member of GNU Coreutils package, therefore it’s available on all Linux distros.
The paste command just does one thing: Merge lines of files. It’s exactly what we need to solve our problems.
5.1. Join Without a Delimiter and With a Single Character Delimiter
Let’s see how to solve the two problems using the paste command:
$ paste -sd '' input.txt
I cameI sawI conquered!
$ paste -sd ',' input.txt
I came,I saw,I conquered!
In the two commands above, we passed two options to the paste command: -s and -d.
The paste command can merge lines from multiple input files. By default, it merges lines in a way that entries in the first column belong to the first file, those in the second column are for the second file, and so on. The -s option can let it merge lines row-wise.
Also, we told the paste command to separate merged lines using a given delimiter character by passing -d ” or -d ‘,’.
5.2. Join With a Multiple Character Delimiter
Since the -d option controls the delimiter in the result. We expect the problem can be solved by passing the -d together with a string of multiple characters to the paste command. Let’s see what will happen:
$ paste -sd "@#" input.txt
I came@I saw#I conquered!
The test above shows that if we pass multiple characters to the -d option, the paste command will convert each character into a delimiter in turn instead of multiple characters delimiter. However, this is not what we want.
The paste command cannot join lines with a delimiter of multiple characters.
6. The sed Command
The sed is a powerful command-line text-processing utility. We can solve the three problems using almost the same code:
$ sed ':a; N; $!ba; s/\n//g' input.txt
I cameI sawI conquered!
$ sed ':a; N; $!ba; s/\n/,/g' input.txt
I came,I saw,I conquered!
$ sed ':a; N; $!ba; s/\n/; /g' input.txt
I came; I saw; I conquered!
Simply put, the idea of this sed one-liner is: append each line into the pattern space, at last replace all line breaks with the given string.
Let’s understand how it works:
- :a; – we define a label called a
- N; – append next line into sed‘s pattern space
- $!ba; – if the current line is the last line ($), do not (!) jump to the label :a (a)
- s/\n/REPLACEMENT/g – replace all line breaks with the given REPLACEMENT
Since sed‘s s/../../g is a regex-based substitution, we can just give different replacements to solve our three problems.
7. The awk Command
The awk is another great command-line text-processing tool.
There are different ways to solve our problems using awk. In this section, we show one of them:
$ awk -v d="" '{s=(NR==1?s:s d)$0}END{print s}' input.txt
I cameI sawI conquered!
$ awk -v d="," '{s=(NR==1?s:s d)$0}END{print s}' input.txt
I came,I saw,I conquered!
$ awk -v d="; " '{s=(NR==1?s:s d)$0}END{print s}' input.txt
I came; I saw; I conquered!
We see that we just set the value of the variable d with our required delimiter, the same awk code will give us the expected result.
Let’s take a closer look at the code to understand how it works:
- -v d=”…” – we created a variable d so that we can avoid hardcoding the delimiter string in code
- s=(NR==1?s:s d)$0 – we concatenate each line ($0) and save to variable s
- NR==1?s:s d – we handled the delimiter since we don’t want to add a delimiter as a prefix to the first line. It’s the compact form of if(NR>1) s=s d
- END{print s} – after all lines processed, we print the variable s in the END block
8. Conclusion
In this article, we’ve discussed how to solve the problem of joining lines in a file. We’ve used different command-line tools to solve the problems in three different scenarios.
We found that some commands cannot handle all three scenarios:
No Delimiter
Single Character Delimiter
Multiple Character Delimiter
Pure Bash
✓
✓
✓
tr
✓
✗
✗
paste
✓
✓
✗
sed
✓
✓
✓
awk
✓
✓
✓