1. Introduction
When dealing with PDF documents, it can be essential to have a reliable application for comparing their content. Often, simple text-based applications might not be adequate. However, we can use a tool that comprehends the complexities of PDF structures.
In this tutorial, we’ll explore three distinct approaches for comparing PDF lines on the command line.
First, we’ll examine diffpdf for checking differences between PDF files line by line. After that, we’ll cover the Meld and diff tools combined with the pdftotext command to achieve the same objective. Finally, we’ll explore Draftable, a Web-based tool designed for comparing PDF files.
2. Sample Files
Before starting the PDF comparison, let’s see what the two sample PDFs look like:
Notably, both files consist of lines of text and an image.
3. Using diffpdf
To begin with, let’s use DiffPDF to compare PDF lines.
In particular, the diffpdf tool is a powerful solution for visually comparing two PDF files. Additionally, it offers a user-friendly interface that enables us to identify differences between two PDF documents. With diffpdf, users can quickly navigate through the compared documents and review highlighted changes.
Since it’s rarely available by default on Linux, we can install diffpdf using the apt-get utility:
$ sudo apt-get install diffpdf
Now, to compare two files, we employ the diffpdf command:
$ diffpdf file1.pdf file2.pdf
Running this command launches the diffpdf graphical interface, presenting both PDF files adjacent to each other for visual comparison. Furthermore, the graphical interface of diffpdf highlights the differences between the two PDF documents:
We can compare PDF files in terms of words, characters, or graphics by adjusting the settings on the right-hand panel under Control > Compare > Appearance.
After reviewing the differences, we can save the comparison output for future reference. From the graphical interface, we can choose various options to save the comparison result:
- save the current compared page
- save all compared pages
- save the comparison of file 1
- save the comparison of file 2
- save the comparison of both files
With diffpdf, conducting PDF comparisons on the Linux command line becomes effortless. Its advanced comparison options and versatile features make it an invaluable tool for PDF document analysis.
4. Using pdftotext
Similarly, we can employ pdftotext with two approaches to compare two PDFs on the command line in Linux.
We’ll focus on two approaches: Meld for a graphical comparison and the diff command for a textual comparison.
4.1. Using meld for Graphical Comparison
The Meld tool and pdftotext aren’t usually available by default on Linux, so we first install them using the apt command:
$ sudo apt-get install meld poppler-utils
Notably, pdftotext is part of the Poppler PDF rendering library. We commonly use it to extract text data from PDF files.
Once the installation is done, we use the meld command with pdftotext using process substitution to compare PDFs:
$ meld <(pdftotext -layout file1.pdf /dev/stdout) <(pdftotext -layout file2.pdf /dev/stdout)
Let’s break down this command:
- -layout specifies that pdftotext should attempt to preserve the layout of the PDF file
- /dev/stdout instructs pdftotext to output the extracted text to the stdout rather than to a file
The <() syntax is a process substitution that creates a temporary file descriptor and substitutes it with the output of the enclosed command. In the above code, we use this twice to encapsulate two runs of the pdftotext, each extracting content from one of the two PDF files.
Putting it all together, this command extracts the textual contents of file1.pdf and file2.pdf using pdftotext. After that, it uses process substitution to pass the extracted text content to meld as if they were files.
Finally, meld then opens a graphical interface displaying the text content of the two PDF files side by side, enabling us to visually compare them and identify any differences:
Notably, we can save this comparison from the File > Save As… option.
4.2. Using diff for Textual Comparison
Alternatively, we can use the diff command for a textual comparison of PDF files. This method is fairly straightforward and doesn’t require graphical tools like Meld. Therefore, we can perform the comparison directly in the terminal.
To compare two PDF files, we simply run diff over the same two process substitution commands:
$ diff <(pdftotext -layout file1.pdf /dev/stdout) <(pdftotext -layout file2.pdf /dev/stdout)
1,32c1,10
< Curabitur bibendum ante urna, sed blandit libero egestas id. Pellentesque rhoncus elit in lacus
...
The command again compares the text content extracted from file1.pdf and file2.pdf while preserving the layout using the -layout option. In addition, the command sends the output to stdout as indicated by /dev/stdout.
The output 1,32c1,10 is a part of the unified diff format, which indicates changes between two files:
- 1,32 specifies the range of lines in the first file that are being replaced
- c indicates that the lines specified before and after the c are being replaced with new content
- 1,10 specifies the range of lines in the second file that replace the lines in the first file
- < Curabitur bibendum … represents the content that is being replaced from the first file
So, in summary, this output indicates that lines 1 through 32 in the first file are being replaced by lines 1 through 10 in the second file, and the displayed content is the part of the first file that is being replaced.
5. Using Draftable
Another approach to compare PDFs is to use Draftable.
Draftable is a Web-based platform that specializes in document comparison, including PDF files. It offers a user-friendly interface and powerful comparison features. Further, its Web-based implementation enables the use of the platform on many systems without installations. Thus, this makes Draftable a popular choice for users seeking efficient document management solutions.
To use this Web tool, we click the Try Draftable Online button from the official site:
Subsequently, we get the option to upload both documents. We click the Upload PDF button or drag and drop the PDF files directly onto the webpage.
Once we upload the required documents, we click Compare to view the comparison:
Draftable should display the comparison results in a side-by-side view, with differences between the documents highlighted:
We can navigate through the documents and review the highlighted changes to see where they occur.
If we need to save or export the comparison results for future reference, we click the Save icon:
Notably, we can pick among various options for saving the comparison result.
6. Conclusion
In this article, we learned several ways to compare PDF lines on the command line.
Initially, we examined diffpdf for the comparison. After that, we covered the combination of the meld and diff with the pdftotext command. Finally, we discussed a Web-based PDF comparison tool to view the differences in PDF files side-by-side.