1. Overview

As system administrators, we commonly encounter backup failures due to a number of unforeseen or inevitable reasons. To rectify this, we may need to know the list of files or folders that failed during the backup and reinitiate it.

So now, this tutorial expounds on the identification of differences in specific file types between two directories. Without any further ado, let’s get into the nitty-gritty details of it.

2. Using the diff Command

Generally, we use the diff command to identify the difference between two files. Further, we can also use this to get the diff between directories. Here, we’ll explore how to use the diff command to compare the contents of two directories.

2.1. Example Directories and Files

Now, let’s take a look at the directories and files that we’ll use in our explanations throughout the article:

$ tree
├── custom_patterns.txt
├── dir1
│   ├── 1.jpg
│   ├── 2.jpg
...
... output truncated ...
...
│   ├── output.xml
│   ├── sub-dir1
│   │   └── 1.png
│   ├── tbundle.gz
└── dir2
    ├── 2.jpg
...
... output truncated ...
... 
    └── ybundle.gz
5 directories, 23 files

For the sake of simplicity and ease of understanding, we’ll be using diff to get a quick comparison between the two directories as given below:

$ diff dir1 dir2
Only in dir1: 1.jpg
Only in dir2: family.jpg
diff dir1/make.xml dir2/make.xml
1d0
< new line added here.
Common subdirectories: dir1/sub-dir1 and dir2/sub-dir1
Only in dir2: sub-dir2
Only in dir1: tbundle.gz
Only in dir2: yahoo.jpg
Only in dir2: ybundle.gz

Furthermore, the option -s in the diff command quickly report both identical and non-identical files or folders in the directory, including content differences.

Now, we’ll focus on the file- and folder-level differences throughout this article:

$ diff -s dir1 dir2
Only in dir1: 1.jpg
Files dir1/2.jpg and dir2/2.jpg are identical
Files dir1/3.jpg and dir2/3.jpg are identical
Only in dir2: family.jpg
...
... output truncated ...
...
Common subdirectories: dir1/sub-dir1 and dir2/sub-dir1
Only in dir2: sub-dir2
Only in dir2: ybundle.gz

In short, the output illustrates that:

  • The files 1.jpg and tbundle.gz are available only in dir1 but not in dir2, while family.jpg, yahoo.jpg, and ybundl2.gz are available only in dir2 and not in dir1.
  • There are two common sub-directories, sub-dir1 and sub-dir2, but the latter is available only in the directory dir2.
  • The contents differ in the make.xml, which is present in both directories. We can, nevertheless, suppress these content-level differences by just using the grep patterns.

2.2. Pattern-Based Filtering

When we use the diff command, it shows the entire set of differences between the directories. Suppose we need a more focused comparison on the availability of file types. In this case, we use the -x option to exclude unwanted file types from the comparison:

$ diff -x '*.xml' -x '*.jpg' dir1 dir2
Only in dir1: tbundle.gz
Only in dir2: ybundle.gz

We can give the exclusion pattern list inline with the command or through a file. For the latter, we use the -X option followed by the file path that contains all the patterns to exclude.

Let’s add two patterns, for xml and gz files, to the file custom_patterns.txt:

$ cat custom_patterns.txt
*.xml
*.gz

Now, we can run a diff command that outputs the dissimilarity at file-level and content-level between the directories. It excludes the file type patterns in the custom_patterns.txt file:

$ diff -X custom_patterns.txt dir1 dir2
Only in dir1: 1.jpg
Only in dir2: family.jpg
Only in dir2: yahoo.jpg

Hence, it identified the differences between the remaining file types by excluding all *.xml and *.gz files:

Now, let’s suppose we have an application directory that has programs, binaries, and log files, among other types. With the help of this option, we can quickly identify the differences between the program files and their contents by excluding the binaries and log files.

Alternatively, we might want to include only some specific files available in the directories. This can be easily achieved with the help of the grep command:

$ diff dir1 dir2 | grep ".gz"
Only in dir1: tbundle.gz
Only in dir2: ybundle.gz

3. Using the find Command

find is a command-line utility that helps to identify a file or directory from a given path. Besides, it provides more flexibility in searching the files and directories.

The -exec option of the find utility helps execute another Linux command on the identified files or folders. For example, let’s use the diff command supplemented with the dir2 path:

$ find . -name "*.jpg" -exec diff {} ../dir2/{} \;
diff: ../dir2/./1.jpg: No such file or directory

$ diff ../dir1/ ../dir2 | grep jpg
Only in ../dir1/: 1.jpg
Only in ../dir2: family.jpg
Only in ../dir2: yahoo.jpg

Here, we’re executing this find command from dir1. Hence, the output only shows what is not there in dir1 compared to dir2, but not vice versa:

4. Conclusion

In this article, we explored the basics of the Linux diff utility and how to use it in conjunction with the find command. Along the way, we saw how to identify the differences of specific file types between two directories.

These tools are inevitably essential in Linux systems and are an often disparaged pillar of Bash programming.