1. Introduction
Sometimes in Linux, we wish to compare text. The diff command is a common solution for this. It provides a few different ways of outputting text differences.
In this short tutorial, we’ll look at how to use diff to compare files and strings.
2. diff Command
We can use the diff command to compare the contents of files. Likewise, it spots differences between two strings.
2.1. How diff Works
diff uses an algorithm to determine the longest common sequence (LCS) of lines that appear in both sources without any lines missing. As a result, the complexity of the LCS problem is O(N×M), where N and M are the sizes of the two input sequences.
When using the diff command, it points out two main types of differences: lines that have been removed, and lines that have been added. Removed lines appear in the first input but not the second, while added lines are the opposite – present in the second input but not the first.
2.2. Installation
Installing the diff command to our Linux system requires adding it using our package manager:
$ sudo apt update
$ sudo apt install diffutils
This provides the Linux system with the diff command if it’s found to be missing.
3. Comparison of Files and Strings
First, let’s view the contents of the files by employing the cat command:
$ cat Info1.txt
box
table
chair
$ cat Info2.txt
box
table
monitor
Now we can use diff to compare the files:
$ diff Info1.txt Info2.txt
$ 3c3
< chair
---
> monitor
Here, <* comes from the first file or string, while *> is from the secondary file or string we are comparing. Then there’s — to indicate where diff switches to show us output from the second file.
3.1. Temporary Files
If we don’t already have our text for comparison in files, we can turn each string into a temporary file before using the diff command:
$ echo "example\nz\nx\ny" > str1.txt
$ echo "example\nz\nx\nv" > str2.txt
$ diff str1.txt str2.txt
These two files are created and then compared using diff :
$ 4c4
< y
---
> v
After comparing, we can remove these temporary files :
$ rm str1.txt str2.txt
3.2. Here Strings
In the Bash shell, Here strings work by creating a file from a string on the fly, extracting the information from it to present to the diff command:
$ diff <<< "example\nfirst\nstring" <<< "example\nsecond\nstring"
$ 2c2
< first
---
> second
In this, the here string converts the string into a readable file and the <<< allows the diff command to scan the string.
3.3. Process Substitution
First, let’s view the contents of both dir1 and dir2 using the ls command:
$ ls dir1
str1.txt
str2.txt
$ ls dir2
str1.txt
str4.txt
Following that, we use the diff command to compare the output of the directories from ls:
$ diff <(ls dir1) <(ls dir2)
$ 2c2
< str2.txt
---
> str4.txt
The diff command allows several methods to compare strings. We can use pure process substitution to swap an echo command for an input to the diff command:
$ diff <(echo "This\nis\na\ntext") <(echo "This\nis\na\nfile")
$ 4c4
< text
---
> file
The diff command displays the difference between the two directories and between the two strings.
4. Output Formats of the diff Command
Configuration of diff is done by adding the flags. We’ll see a few different output formats in this section.
4.1. Default Format
Let’s look at diff‘s default output format:
$ diff <(echo -e "example\n1\n2\n3") <(echo -e "example\n1\n5\n3")
$ 3c3
< 2
> 5
As we can see, the differences are shown, as with previous examples.
4.2. Unified Format
The unified format shows the coordinates of the difference in addition to the contrast and is used for version management. Here’s an example:
$ diff -u <(echo -e "x\ny\nz") <(echo -e "x\ny\nv")
$ @@ -1,3 +1,3 @@
x
y
-z
+v
The first part, which is the chunk header, shows us the exact coordinates of the difference in the strings which is in the third line in the example above. Then, the – indicates that it’s a part of the original string, while the + indicates that it’s a part of the second string.
4.3. Side-By-Side Format
The side-by-side form can be a little easier to read:
$ diff -y <(echo -e "This\nis\nsoftware") <(echo -e "This\nis\nhardware")
This This
is is
software | hardware
The | symbol shows the difference between the two strings. The left side is the original string and the right side is the secondary string.
4.4. The ed Script Format
Lastly, this format is a valid choice for using the diff command in the ed editor. This is useful for showing only the differences, unlike other formats that show a full comparison:
$ diff -e <(echo -e "L\ni\nn\nu\nx\na") <(echo -e "L\ni\nn\nu\nx\nb")
$ 6c
b
.
It shows us three essential pieces of information: the number of the line, c for change, and the character or part of the string needing change.
5. Conclusion
In this article, we discussed a few ways of using the diff command to compare strings.
Firstly, we saw the diff command in general and how to install it. Then we discussed the different forms of comparing the strings finding process substitution to be the most convenient.
We also looked at the different outputs of the diff command and found that all of them provide a sufficient result to each user who intends to use it for different purposes.
Finally, we discovered the versatility of using diff which led us to utilize it as the main way of comparing strings.