1. Overview
In this tutorial, we’re going to learn how to compare two files, word by word, on the Linux command line. Linux already has a command, diff, that compares two files. However, it compares them line by line and can’t compare the words inside those lines.
Here, we are going to use another command, wdiff, that shows word differences between two files.
2. Using wdiff
wdiff doesn’t come pre-installed in Linux, so we need to install it:
$ sudo apt install wdiff
After the installation is finished, we might want to make sure it’s installed:
$ wdiff --version
wdiff (GNU wdiff) 1.2.2
Copyright (C) 1992, 1997, 1998, 1999, 2009, 2010, 2011, 2012 Free Software
Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Franc,ois Pinard <[email protected]>.
Now let’s assume we have two text files, first.txt and second.txt. We can use cat to output their content:
$ cat first.txt
the quick brown fox jumps over the lazy dog
the woman snores and her husband wakes up
the sky is blue
$ cat second.txt
The quick yellow fox jumps over the sleeping dog
The man snores and his wife wakes up
It's raining
Now we run wdiff to see the differences:
$ wdiff first.txt second.txt
[-the-]{+The+} quick [-brown-] {+yellow+} fox jumps over the [-lazy-] {+sleeping+} dog
[-the woman-]
{+The man+} snores and [-her husband-] {+his wife+} wakes up
[-the sky is blue-]
{+It's raining+}
wdiff tells us what words we need to change in the first file, so it matches the second file. The square brackets ‘[-word-]’ refer to the words we need to remove from the first file, and the curly brackets ‘{++}’ refer to the words we need to add to the first file.
3. Filtering the Output
If we add the option -1, the output will not contain the words we need to remove from the first file:
$ wdiff -1 first.txt second.txt
{+The+} quick {+yellow+} fox jumps over the {+sleeping+} dog
{+The man+} snores and {+his wife+} wakes up
{+It's raining+}
Notice that the output does not include the square brackets now.
If we add -2, the output will not contain the words we need to add to the first file:
$ wdiff -2 first.txt second.txt
[-the-] quick [-brown-] fox jumps over the [-lazy-] dog
[-the woman-] snores and [-her husband-] wakes up
[-the sky is blue-]
Finally, if we add -3, the output will not include the words that are in common between the two files:
$ wdiff -3 first.txt second.txt
======================================================================
[-the-]{+The+}
======================================================================
[-brown-] {+yellow+}
======================================================================
[-lazy-] {+sleeping+}
======================================================================
[-the woman-]
{+The man+}
======================================================================
[-her husband-] {+his wife+}
======================================================================
[-the sky is blue-]
{+It's raining+}
======================================================================
4. Ignore Case
wdiff is case-sensitive by default. To make it case-insensitive, we can add the option –ignore-case:
$ wdiff --ignore-case first.txt second.txt
The quick [-brown-] {+yellow+} fox jumps over the [-lazy-] {+sleeping+} dog
The [-woman-] {+man+} snores and [-her husband-] {+his wife+} wakes up
[-the sky is blue-]
{+It's raining+}
We can see that the output doesn’t include the same words with different capitalizations.
5. Colorizing the Output
Reading the output of wdiff can be difficult for the average user. However, there is a package, colordiff, that colorizes the output of wdiff and makes it easier to understand. Linux does not have colordiff by default. Therefore, we need to install it:
$ sudo apt install colordiff
After the installation is finished, we can pipe the output of wdiff to colordiff:
The words that are printed in red are the words that we need to remove from the first file, and the words that are printed in green are the words that we need to add to the first file.
6. Conclusion
To sum up, by using wdiff, we can see word differences between the two files. We can add different options to filter the output, and we can also pipe the output of wdiff to another command, colordiff, to make it easier to read and understand.