1. Overview
The while loop with the read command is a well-known and efficient way to read a file line by line. However, sometimes we need a way to accomplish more complicated tasks.
In this tutorial, we’ll learn how to read corresponding lines from two files.
2. Problem Statement
Let’s assume that we have two text files, fileA.txt and fileB.txt:
$ cat fileA.txt
File A - line #1
File A - line #2
File A - line #3
$ cat fileB.txt
File B - line #1
File B - line #2
File B - line #3
Next, we want to perform some operations on corresponding lines from both files. Then, let’s introduce a mock-up function two_lines_operation, which only prints its arguments:
two_lines_operation ()
{
echo "Doing something with lines from two files:"
printf 'fileA.txt line: %s\n' "${1}"
printf 'fileB.txt line: %s\n' "${2}"
printf '\n'
}
So, our desired output looks like this:
Doing something with lines from two files:
fileA.txt line: File A - line #1
fileB.txt line: File B - line #1
Doing something with lines from two files:
fileA.txt line: File A - line #2
fileB.txt line: File B - line #2
Doing something with lines from two files:
fileA.txt line: File A - line #3
fileB.txt line: File B - line #3
Finally, we’re going to put our function in the library mockup to be sourced in scripts.
3. The Nested Loop Approach
As the first attempt, let’s use the nested while read loop, at the same time keeping track of read lines.
Let’s take a look at the nreader script:
#!/bin/bash
source mockup # library with function to work with lines
countA=0
while read lineA
do
countB=0
while read lineB
do
if [ "$countA" -eq "$countB" ]
then
two_lines_operation "$lineA" "$lineB"
break
fi
countB=`expr $countB + 1`
done < fileB.txt
countA=`expr $countA + 1`
done < fileA.txt
We count the lines and print the second file’s line only when its number matches. Let’s check the result:
$ ./nreader
Doing something with lines from two files:
fileA.txt line: File A - line #1
fileB.txt line: File B - line #1
Doing something with lines from two files:
fileA.txt line: File A - line #2
fileB.txt line: File B - line #2
Doing something with lines from two files:
fileA.txt line: File A - line #3
fileB.txt line: File B - line #3
We met our demands, but the construction is far from optimal. Indeed, reading each time the full content of fileB.txt to pick out only one matching line is wasteful.
4. Reading Simultaneously From Both Files
Let’s extend the well-known while read loop to operate with two files. So, we’re going to read two files simultaneously within the same loop pass.
4.1. Bash 4.1 and Higher
Since Bash 4.1, we can open the files explicitly instead of caring about numerical file descriptors. Let’s apply it in the rreader script:
#!/bin/bash
source mockup
# open input files
exec {fdA}<fileA.txt
exec {fdB}<fileB.txt
while read -r -u "$fdA" lineA && read -r -u "$fdB" lineB
do
two_lines_operation "$lineA" "$lineB"
done
exec {fdA}>&- {fdB}>&- # close input files
Let’s notice that exec {fd}<file_name* opens the file and creates the file descriptor *fd*. Subsequently, we need to explicitly close the file related to the file descriptor with *exec {fd}>&-.
Now, let’s check the output:
$ ./rreader
Doing something with lines from two files:
fileA.txt line: File A - line #1
fileB.txt line: File B - line #1
Doing something with lines from two files:
fileA.txt line: File A - line #2
# ... more output skipped
4.2. Earlier Bash Versions
When working with earlier Bash versions, we need to provide valid file descriptors on our’s own. Moreover, this way is still quite widespread. So, let’s rewrite the rreader script:
#!/bin/bash
source mockup
while read -r -u 3 lineA && read -r -u 4 lineB
do
two_lines_operation "$lineA" "$lineB"
done 3<"fileA.txt" 4<"fileB.txt"
We set file descriptors equal to 3 for fileA.txt and 4 for fileB.txt. In addition, we should not use 0, 1, and 2 because they are reserved for stdin, stdout, and stderr, respectively.
5. Working With Arrays
Let’s notice that solutions showed so far processed file line by line, without reading the whole file’s content into memory. Now, let’s read the files’ lines into arrays and process them later.
Let’s study the areader script. First, let’s define empty arrays to store the lines. With this, we can simply append each line at the end of the array. Then, we fill the arrays in the read loop.
#!/bin/bash
source mockup
exec {fdA}<fileA.txt
exec {fdB}<fileB.txt
# define empty arrays
arrA=()
arrB=()
while read -r -u "$fdA" lineA && read -r -u "$fdB" lineB
do
arrA+=("$lineA") # append line at the end of array
arrB+=("$lineB")
done
exec {fdA}>&- {fdB}>&-
# check the result
for i in "${!arrA[@]}"
do
two_lines_operation "${arrA[$i]}" "${arrB[$i]}"
done
Let’s notice that when printing arrays’ rows, we are looping over indices of the first array ${!arrA[@]}. It’s correct, as far as both arrays have equal lengths.
Next, let’s check the output:
$ ./areader
Doing something with lines from two files:
fileA.txt line: File A - line #1
fileB.txt line: File B - line #1
Doing something with lines from two files:
fileA.txt line: File A - line #2
# ... more output skipped
6. Using mapfile
The mapfile utility reads the file’s content into an array. Therefore, we don’t need to create an explicit loop.
The command has been a build-in since Bash version 4, and its alias is readarray.
6.1. Reading in Whole Files
Let’s read two files into corresponding arrays with the mareader script:
#!/bin/bash
source mockup
mapfile -t arrA < fileA.txt
mapfile -t arrB < fileB.txt
for i in "${!arrA[@]}"
do
two_lines_operation "${arrA[$i]}" "${arrB[$i]}"
done
Let’s notice the t flag to strip newlines from read lines.
Now let’s check the usual output:
$ ./mareader
Doing something with lines from two files:
fileA.txt line: File A - line #1
fileB.txt line: File B - line #1
Doing something with lines from two files:
fileA.txt line: File A - line #2
# ... more output skipped
6.2. Reading in a Part of the File
Let’s read only a part of the file at once. So, with the s switch, we discard an initial number of lines. Then, with the n switch, we only read a specified number of lines. As an example, we’re going to read only the second lines of both files:
#!/bin/bash
source mockup
mapfile -t -n1 -s1 arrA < fileA.txt
mapfile -t -n1 -s1 arrB < fileB.txt
for i in "${!arrA[@]}"
do
two_lines_operation "${arrA[$i]}" "${arrB[$i]}"
done
Let’s run the script:
$ ./mareader
Doing something with lines from two files:
fileA.txt line: File A - line #2
fileB.txt line: File B - line #2
7. Playing Around With paste
The paste command reads many input files at once and prints together corresponding lines. However, if we want to perform other operations, we need access to each line.
*Therefore, let’s feed the while IFS loop with the paste‘s output.* Because the tabulator is the default separator for the paste command, we’re going to use it as IFS too in the preader script:
#!/bin/bash
source mockup
while IFS=$'\t' read -r lineA lineB
do
two_lines_operation "$lineA" "$lineB"
done < <(paste fileA.txt fileB.txt)
*Let’s notice the use of the process substitution <(paste fileA.txt fileB.txt).* Briefly, with this construct, read recognizes the paste‘s output as a file. Let’s pay attention to the space between two redirection operators <.
Now, let’s check the output:
$ ./preader
Doing something with lines from two files:
fileA.txt line: File A - line #1
fileB.txt line: File B - line #1
Doing something with lines from two files:
fileA.txt line: File A - line #2
# ... more output skipped
This script fails if the first file itself contains tab separators. Of course, we can use any other separator for both paste and IFS, choosing one that is not expected in fileA.txt.
8. Miscellaneous Commands
Now let’s look through solutions that use specific Bash commands to select lines. So, we’re going to read a line from the first file and then retrieve the corresponding one from the second file.
Let’s notice that this approach is similar to the nested loop example and is similarly less efficient than the other presented so far. However, for small files, it’d be an acceptable way.
8.1. The head and tail Solution
*Let’s get the nth line from the file with the construct head -n | less -1:*
#!/bin/bash
source mockup
nr=0
while read lineA
do
nr=`expr $nr + 1`
two_lines_operation "$lineA" "$(head -"$nr" fileB.txt | tail -1)"
done < fileA.txt
8.2. The sed Solution
*Next, let’s use sed, the stream editor, to find the nth line and quit immediately:*
#!/bin/bash
source mockup
nr=0
while read lineA
do
two_lines_operation "$lineA" "$(sed "$(($nr + 1))q;d" fileB.txt)"
nr=`expr $nr + 1`
done < fileA.txt
8.3. The awk Solution
Finally, let’s step into awk:
#!/bin/bash
source mockup
nr=0
while read lineA
do
nr=`expr $nr + 1`
two_lines_operation "$lineA" "$(awk "NR == $nr {print; exit;}" fileB.txt)"
done < fileA.txt
9. Conclusion
In this article, we learned how to read corresponding lines from two input files. We started from a naive, nested loop solution. Then, we used the while read loop to access both files simultaneously, within a single pass of the loop.
Then, we split the problem into two parts. First, we read the files into the Bash arrays, subsequently processing the arrays’ elements. We populated the arrays either directly with read or using the mapfile command.
Next, we approached the problem with the paste command.
Finally, we looked at miscellaneous Bash commands, which help read a line with a given number.