1. Introduction
We often want to compare two text files or compare different versions of the same text file to identify differences. Linux has the diff and comm commands for such a requirement.
In this tutorial, we’ll learn to use the diff and the comm commands to compare two files.
2. Example Files
Let’s use the cat command to see the contents of the example text files:
$ cat file1
The Tortoise and the Hare
The Boy Who Cried Wolf
Androcles and the Lion
$ cat file2
The Tortoise and the Hare
The Boy Who Cried Wolf
Androcles and the Lion
The Town Mouse and The Country Mouse
The Crow and the Pitcher
Please note that file2 has two extra lines at the end as compared to file1.
3. Comparing Files
3.1. Using the diff Command
Let’s apply the diff command to compare files file1 and file2, which we created in the last section.
$ diff -u file1 file2
--- file1 2021-08-31 23:25:50.000000000 +0100
+++ file2 2021-08-31 23:27:05.000000000 +0100
@@ -1,3 +1,5 @@
The Tortoise and the Hare
The Boy Who Cried Wolf
Androcles and the Lion
+The Town Mouse and The Country Mouse
+The Crow and the Pitcher
In the above output, we see a unified difference between the two files. As we know, file2 has extra two lines at the end. Here in the output, we see those lines highlighted with ‘+’ in front of the line. In the above command, since file1 appears before file2, the diff command compared the files from the perspective of file1.
Now, let’s reverse the order of the files in the diff command to see how the output is affected.
$ diff -u file2 file1
--- file2 2021-08-31 23:27:05.000000000 +0100
+++ file1 2021-08-31 23:25:50.000000000 +0100
@@ -1,5 +1,3 @@
The Tortoise and the Hare
The Boy Who Cried Wolf
Androcles and the Lion
-The Town Mouse and The Country Mouse
-The Crow and the Pitcher
As we see in the output, from the perspective of file2, file1 has two fewer rows, as indicated by the ‘-‘ in front of the lines.
The order of files and the result of diff could be confusing. To avoid confusion, an easy way to remember the result is to understand the order of the files. For example, if we have our diff command like this:
diff -u FILE1 FILE2
then, if the result has
- ‘+’ in front of the line, the line is only in FILE2
- ‘-‘ in front of the line, the line is only in FILE1
3.2. Using the comm Command
Let’s compare the files using the comm command. The comm command compares files line by line.
$ comm [OPTION] FILE1 FILE2
The comm command has 3 options -1, -2, -3. The command with no options produces a three-column output. Column 1 contains lines unique to FILE1, column 2 contains lines unique to FILE2, and column 3 contains lines common to both files. When we provide the option 1, 2 or 3 respectively the comm command suppresses the respective column.
For example, let’s run the comm command to see the unique lines in file1:
$ comm -23 file1 file2
There are no unique lines in file1, and hence the output is blank above. The -23 option retains column 1, which has the lines unique to file1 and suppresses column 2 and column 3.
Similarly, let’s see for unique lines in file2 using the following command, here -13 suppress the output from column 1 and column 3, respectively:
$ comm -13 file1 file2
The Town Mouse and The Country Mouse
The Crow and the Pitcher
As we know, file2 has two more lines than file1. We see those two lines as output.
4. Comparing Files – Finding Only Additions
Suppose we want to compare two files for only additions between them. Let’s apply the diff command for this use case. Based on the files we have created above, let’s run the following command:
$ diff -u file2 file1 | sed -n '/^+[^+]/ s/^+//p'
Here we apply sed with diff command to transform the output of diff to show only added lines. The sed command in Linux expands to stream editor. It looks for the output of diff with a pattern with only one ‘+’ and substitutes ‘+’ in that line with an empty character and prints it. Since we know that file1 does not have any new additions from the perspective of file2, we do not see any added lines in the output.
Similarly, let’s try the same use case with the comm command:
comm -23 file1 file2
Here, since there are no unique lines in file1, we get blank output.
5. Comparing Files – Finding Only Removals
Now, let us look into the opposite use case to show only the removed lines in these files.
$ diff -u file2 file1 | sed -n '/^-[^-]/ s/^-//p'
The Town Mouse and The Country Mouse
The Crow and the Pitcher
Similar to the last section, here we look at the output of diff command for lines with only one ‘-‘, and replace ‘-‘ with an empty character and print it. As we know, file1 has two fewer lines from the perspective of file2, and we see them in the output.
We can find the removed lines using the comm command as well:
$ comm -13 file1 file2
The Town Mouse and The Country Mouse
The Crow and the Pitcher
Since column 1 and column 3 are suppressed in the above command we see the lines unique to the file2.
6. Conclusion
In this article, we learned to use the diff and the comm commands in Linux to compare two text files with an example. We also saw how we could narrow this comparison to find only additions and only removals.