1. Overview
Both PDF and EPUB are popular document formats widely used in digital publications. However, converting a PDF file to an EPUB can enhance readability, accessibility, and text reflow.
In this tutorial, we’ll explore two methods to convert PDF files to the EPUB format in Linux.
2. EPUB Over PDF
Let’s discuss some cases in which the EPUB format is better suited than the PDF format.
The PDF format is useful when we have a document with a fixed layout. However, in the case of eBooks or digital documents with dynamic layouts, the EPUB format performs better than the PDF format. The EPUB format’s reflowable nature enables text to fit onto screens, usually making the information easier to read.
Moreover, EPUBs can adapt to the various dimensions of reading devices. Hence, readers don’t need to incessantly zoom in and out to read text. Furthermore, it ensures that users see the same content no matter what screen size they have. Therefore, the EPUB format provides a better reading experience for reading documents on smartphones, tablets, and e-readers.
Editing EPUB files is much easier than editing PDF files, as we don’t need specialized software to edit them. This ease of editing enables us to make revisions and update published documents quickly.
Finally, due to their file structure, EPUB files are generally smaller in size than PDF files. Hence, they’re easy to download, share, and store in devices with limited internal storage capacity.
3. Using Calibre
Calibre is a powerful document management tool in Linux. We can use it to organize our digital documents using metadata, convert between different formats, and edit documents.
3.1. Installation
First, let’s discuss how to deploy Calibre in Linux.
To install Calibre in Debian-based systems, we can use the apt command:
$ sudo apt install calibre
Furthermore, on Arch and Arch-derivatives, we utilize the pacman command to install Calibre:
$ sudo pacman -S calibre
After the installation, let’s verify the version of the Calibre tool. We use the ebook-viewer command to view the current version of Calibre:
$ ebook-viewer --version
ebook-viewer (calibre 5.37)
As the output shows, the installation is completed.
3.2. Usage
Now, we open the Calibre tool:
$ calibre
Let’s take a look at the main screen:
The next step is to go to the Add books option and add the target PDF file:
As soon as we add the PDF file, it appears on the main list of the Calibre tool:
In this case, we added a PDF file named sample.pdf. Now, we right-click on the PDF file and go to the Convert option:
Before we start the conversion process, the Calibre tool shows us a preview which includes details about the added file and output format:
After verification, we confirm the conversion. Finally, when the conversion is done, the Calibre tool gives us a notification.
4. Using pdftohtml
An alternative approach is to convert the PDF file to HTML format using the pdftohtml tool and then convert the HTML file to EPUB format.
We mainly utilize the pdftohtml tool in Linux to convert PDF documents to HTML format. Often, the pdftohtml tool is used for Web publishing and text extraction.
4.1. Installation
The pdftohtml tool is a part of the Poppler utilities. We can install the pdftohtml tool by installing Poppler in Debian-based systems using the apt command:
$ sudo apt-get install poppler-utils
Alternatively, we can install pdftohtml in Arch and Arch-derivatives using the pacman command:
$ sudo pacman -S poppler
Now, let’s verify the installation status of pdftohtml:
$ pdftohtml -v
pdftohtml version 22.02.0
Copyright 2005-2022 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1999-2003 Gueorgui Ovtcharov and Rainer Dorsch
Copyright 1996-2011 Glyph & Cog, LLC
Thus, we successfully installed pdftohtml in our system.
4.2. Usage
First, let’s take a look at the metadata of the input PDF file using the pdfinfo command, which is a part of the Poppler utilities:
$ pdfinfo sample.pdf
Creator: Writer
Producer: LibreOffice 4.2
CreationDate: Wed Aug 16 08:42:28 2017 EDT
Custom Metadata: no
Metadata Stream: no
Tagged: no
UserProperties: no
Suspects: no
Form: none
JavaScript: no
Pages: 5
Encrypted: no
Page size: 595 x 842 pts (A4)
Page rot: 0
File size: 469513 bytes
Optimized: no
PDF version: 1.4
The next step is to use pdftohtml to convert the PDF file into the HTML format:
$ pdftohtml sample.pdf output1.html
Page-1
Page-2
Page-3
Page-4
Page-5
The output shows that all five pages of the PDF file are converted to HTML.
Moreover, the final step is to convert the HTML file into the EPUB format. To make this conversion, we utilize the ebook-convert command, which is part of the Calibre tool:
$ ebook-convert output1.html final_output.epub
1% Converting input to HTML...
InputFormatPlugin: HTML Input running
on /home/sam/Downloads/output1.html
Building file list...
Normalizing filename cases
Rewriting HTML links
Forcing output1.html into XHTML namespace
...output truncated...
At this point, we’ve successfully converted the PDF file into the EPUB format.
5. Conclusion
In this article, we discussed two methods to convert PDF files to the EPUB format in Linux.
Calibre is a comprehensive document management tool that facilitates a direct option for converting documents in different formats, including PDF to EPUB.
On the other hand, we can use the pdftohtml tool to convert the PDF file to the HTML format and then use Calibre to convert the HTML file to the EPUB format.