1. Overview
Portable Document Format (PDF) files are used for sharing and preserving documents due to their platform-independent nature. However, there are instances when we may need to convert specific pages or the entire PDF document into image formats like JPEG or PNG.
In this tutorial, we’ll explore different methods and tools available on Linux for converting PDF to image.
2. Converting PDF to Image by Poppler Utilities
The Poppler library is a useful tool for rendering PDF documents in Linux. It comes with several command-line utilities, including pdftoppm, pdftocairo, and pdfimages. Each utility serves a specific purpose in extracting images from PDF files.
Most Linux distributions have poppler-utils pre-installed, but if not, we can install it using a package manager like apt:
$ sudo apt install poppler-utils
2.1. Using pdftoppm
pdftoppm is a powerful command-line tool that converts PDF pages to Portable Pixmap (PPM) image files, which we can further convert to other image formats like JPEG or PNG.
Let’s run the pdftoppm command to convert PDF to image:
$ pdftoppm input.pdf output -jpeg
The command instructs pdftoppm to convert a PDF file named input.pdf into JPEG image format. The pdftoppm command saves the resulting JPEG images with the filename output followed by a numeric suffix for each page of the PDF.
Next, let’s use pdftoppm along with the -f option, which specifies the page number from which the conversion should start, and the -l option at which the conversion should end:
$ pdftoppm -jpeg -f 3 -l 5 input.pdf output
This above command converts pages 3, 4, and 5 of the input.pdf file to JPEG images.
2.2. Using pdftocairo
Another useful way to convert PDF to image is the pdftocairo command. Using the cairo output device of the Poppler PDF library, pdftocairo transforms PDF files into images. It supports various output image formats.
Let’s see how we can use pdftocario to convert PDF to images:
$ pdftocairo input.pdf -png
The command converts input.pdf file to the image in PNG format. Also, we can replace -png with -jpeg to convert to JPEG format.
Further, let’s convert a specific page range of PDF to an image:
$ pdftocairo -jpeg -f 2 -l 4 input.pdf output
The command converts from page 2 to page 4 in JPEG image format.
3. Converting PDF to Image Using Ghostscript
Ghostscript (gs) is a powerful interpreter for the PostScript language and PDF files. It’s useful in various tasks related to PDF manipulation, including converting PDFs to images.
Let’s use Ghostscript to convert PDFs to images:
$ gs -dNOPAUSE -sDEVICE=jpeg -r300 -sOutputFile=output-%03d.jpeg input.pdf -c quit
GPL Ghostscript 10.0.0 (2022-09-21)
Copyright (C) 2022 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 1 through 4.
Page 1
Page 2
Page 3
Page 4
The command converts all pages of the input.pdf file into separate JPEG images, with filenames like output-001.jpeg, output-002.jpeg, and so on.
Let’s break down the command and understand it:
- gs: This is the command to invoke Ghostscript.
- -dNOPAUSE: This option tells Ghostscript not to pause between pages during processing.
- -sDEVICE=jpeg: This sets the output device to JPEG format. We can replace jpeg with pngalpha if we desire image format in PNG.
- -r300: Sets the resolution to 300 DPI (dots per inch). We can adjust this value to change the output image’s quality and size.
- -sOutputFile=output-%03d.jpeg: This specifies the output file pattern. %03d indicates that the page number will be zero-padded to three digits. So, the output files will be named output-001.jpeg, output-002.jpeg, etc.
- input.pdf: The name of the input PDF file to convert.
- -c quit: This flag tells Ghostscript to quit after processing the PDF.
Similarly, when it comes to converting specific pages of a PDF file to images using Ghostscript, we can add the desired page range to the command. Ghostscript allows us to specify a page range using the -dFirstPage and -dLastPage options:
$ gs -dNOPAUSE -sDEVICE=pngalpha -r300 -dFirstPage=2 -dLastPage=4 -sOutputFile=output-%03d.png input.pdf -c quit
GPL Ghostscript 10.0.0 (2022-09-21)
Copyright (C) 2022 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 2 through 4.
Page 2
Page 3
Page 4
The command converts pages 2, 3, and 4 of the input.pdf file to PNG images with transparent backgrounds and saves them with the specified naming pattern.
4. Conclusion
In this article, we explored several methods to convert PDF to image in Linux.
Whether we choose pdfoppm or pdftocairo from Poppler Utilities, or Ghostscript (gs), each method provides a straightforward way to convert PDF documents to image files. Depending on our needs, we can select the most suitable tool and even customize the output format and resolution to fit our requirements.
With these methods at our disposal, handling PDF files and utilizing their content in image form becomes an easy task in the Linux environment.