1. Introduction
PDF (Portable Document Format) files are widely used for sharing documents due to their platform independence and formatting capabilities. However, the size of PDF files can sometimes be large, making them difficult to share or store.
In this tutorial, we will explore techniques for optimizing PDF file sizes on Linux to reduce their sizes without compromising the quality or content.
2. Benefits of PDF Optimization in Linux
Optimizing PDF files has several benefits, including:
- less storage space, enabling efficient data management
- faster and more reliable file transfers
- being easier to open and view on low-resource devices
- faster upload and download times
- lower storage costs, decreased network bandwidth usage, and potentially reduced cloud storage subscription fees
Linux supports the ghostscript, qpdf, and exiftool tools for optimizing PDF files.
3. Using ghostscript
ghostscript provides various options to optimize PDF file sizes. Let’s take a closer look.
3.1. Installation
Before installing tools in Linux, we use sudo apt update to update our system.
On Debian-based distributions, we install ghostscript using:
$ sudo apt install ghostscript poppler-utils
On Fedora/CentOS-based distributions, we run:
$ sudo dnf install ghostscript poppler-utils
3.2. Optimization
To optimize a PDF using ghostscript, we use the gs command:
gs [switches][input]
Here’s a breakdown of the different components of its syntax:
- [switches] are optional command-line options that modify the behavior of ghostscript. These options are preceded by a hyphen.
- [input] is the path to the input file(s) we want to optimize.
Let’s check out some switches in an example:
$ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf
Let’s break down what each of these switches means:
- -sDEVICE=pdfwrite specifies the output device to be used. In this case, pdfwrite indicates that the output will be in PDF format
- -dCompatibilityLevel sets the compatibility level for the optimized PDF. In the example, the compatibility level is set to 1.4, which corresponds to the PDF version 1.4
- -dPDFSETTINGS determines the quality and compression settings for the optimized PDF. Here, we’re optimizing for on-screen viewing. Other possible settings include /ebook (for PDF files that will be viewed on e-book readers), /printer (for producing PDF files for high-quality printing), and /default (a setting that offers a balance between file size and quality)
- -dNOPAUSE prevents ghostscript from pausing between pages
- -dQUIET suppresses informational messages
- -dBATCH prevents ghostscript from exiting after processing the input file
- -sOutputFile=output.pdf specifies the output file name
4. Using the qpdf Command
qpdf is another useful tool for optimizing PDF file sizes on Linux. It provides advanced optimization or compression options to further reduce the size of PDF files.
4.1. Installation
After updating our system with sudo apt update, we install qpdf based on our Linux distribution:
On Debian-based distributions, we run:
$ sudo apt install qpdf
On Fedora/Centos-based distributions, we run:
$ sudo dnf install qpdf
4.2. Optimization
The syntax for optimizing documents is:
qpdf [options] [input] [output]
Here’s a breakdown of the different components of the qpdf syntax:
- [options] refers to optional command-line options we use to modify the behavior of qpdf. These options are preceded by a hyphen or double hyphen.
- [input] is a path to the input PDF.
- [output] is the path where the output PDF will be saved.
The arguments can be in any order, but the input filename must precede the output filename.
For example, let’s compress the document.pdf file:
$ qpdf --compress-streams=y --object-streams=generate document.pdf qpdf_compressed.pdf
Let’s take a closer look at the arguments:
- –compress-streams=y instructs qpdf to compress the content streams within the PDF file. Content streams contain the actual data, such as text and images, within the PDF document.
- –object-streams=generate specifies the handling of object streams in the PDF file. The generate option tells qpdf to generate new object streams during the optimization process, which further reduces the file size.
- document.pdf is the input file to be optimized.
- qpdf_compressed.pdf is the output or optimized file.
5. Using the exiftool Command
PDF files can contain metadata such as author names, creation dates, and other information that may contribute to the file size. We can remove this metadata to reduce the PDF file size further using exiftool.
5.1. Installation
We first need to install exiftool.
On Debian-based distributions, we install exiftool by running:
$ sudo apt install libimage-exiftool-perl
On Fedora/Centos-based distributions, we run:
$ sudo dnf install perl-Image-ExifTool
5.2. Optimization
The syntax for optimization is:
exiftool [options] [input]
- [options] are optional command-line options that modify the behavior of exiftool, preceded by a hyphen.
- [input] is the path to the document we want to optimize.
For example, to remove metadata from a PDF file document.pdf using exiftool, we execute:
$ exiftool -all:all= document.pdf
In this example, -all:all= specifies the tag name to be modified.
Note that, unlike ghostscript* and qpdf, e*xiftool doesn’t create a new PDF file** but optimizes the same PDF document in place.
6. Differences Between ghostscript, qpdf, and exiftool
Let’s summarize the differences between these three tools*:*
Basis
ghostscript
qpdf
exiftool
Focus
PostScript and PDF files
PDF optimization and manipulation
metadata manipulation
Supported formats
PostScript (PS), Encapsulated PostScript (EPS), PDF
wide range of file formats, including PDF
Functionality
interpreter for PostScript and PDF page description languages
PDF file manipulation and optimization
metadata manipulation
Output
a new PDF
a new PDF file
the original PDF is optimized
7. Conclusion
In this article, we explored three popular tools for reducing PDFs: ghostscript, qpdf, and exiftool. Optimizing the sizes of PDF files has many benefits, such as more efficient sharing and using less storage.