1. Overview
There are times where we need to compare the contents of two files. Perhaps we have two duplicate files and we need to make sure they contain the same data before deleting one.
In this tutorial, we’ll explore the comm command. This tool allows us to compare two files without the need of scrolling through them line by line.
2. What Is comm?
The comm command requires two file names, which it then compares, and returns three columns depending on the uniqueness of the data. The first column is for unique values in file1, the second for unique values in file2, and third for the values that are the same in both files.
The diff command is similar but more complex of a command that we can use to compare files. With comm, the output is simpler and the function is easier to use. The simplicity of comm is what makes it better suited for use in scripts.
Before we run the comm command, we must ensure the contents of the files are both sorted. Otherwise, it will return an error and won’t display the correct output. We will look at ways to sort the data later in this tutorial.
3. Input and Output
If we don’t need to format the files, then we can just run the comm command followed by the two file names we would like to compare:
$ comm file1 file2
3.1. Test Files
Let’s create these two files (file1 and file2) so that we can compare the content of both using the comm command:
$ cat <<EOF> file1
> Cat
> Dog
> Pony
> fish
> EOF
$ cat <<EOF> file2
> fish
> dog
> hamster
> Pig
> EOF
These commands create file1 and file2 intentionally out of alphabetical order so that we can see how to sort these values later.
3.2. Sorting Files
We must sort files using the same sorting standards for the comm command to work properly.
If we have sorted both files properly, the command will run as expected, without error. However, if we don’t sort the files, we’ll get an error in the results at the first unsorted value stating that file N is not sorted.
The remaining contents of the files will be output, but it will not compare the remaining content as expected:
$ comm file1 file2
Cat
Dog
fish
comm: file 2 is not in sorted order
dog
hamster
Pig
Pony
comm: file 1 is not in sorted order
fish
If a file is not already sorted, we can use the sort command to help us.
Knowing that sort outputs a list of sorted values, we can use the <() syntax to substitute the output from both sort commands for our two files.
The comm command will then run the comparison of the sorted rather than the unsorted lists:
$ comm <(sort file1) <(sort file2)
Cat
dog
Dog
fish
hamster
Pig
Pony
3.3. Specifying Comparisons to Hide
We can specify which columns we don’t want to show in the output. To do this, we need to add a minus (-) followed by the columns we would like to hide.
For example, if we only want to show column 3 (the “both” column), we can say to hide columns 1 and 2:
$ comm -12 <(sort file1) <(sort file2)
fish
We can now see clearly that the only entry both files share is “fish”.
3.4. Ignoring Case Sensitivity
Looking at a previous example, we’ve seen that both files return the word “dog” as being unique in each.
The comm command is case sensitive. If both values are not exactly the same, it will show as a unique value in both file columns.
If there are case differences between the two files, we can convert both to lowercase within the comm command by using the tr command inline:
$ comm <(sort <(tr '[:upper:]' '[:lower:]' < file1)) <(sort <(tr '[:upper:]' '[:lower:]' < file2))
cat
dog
fish
hamster
pig
pony
4. Conclusion
In this short tutorial, we learned what the comm command is used for and how to use it.
Further information on additional options can be found at comm‘s man page.