1. Introduction

We should be able to convert popular document formats from one to another easily. This is especially true for such widespread text document formats as Microsoft doc or docx, and PDF.

In this tutorial, we’ll learn ways to convert multiple doc/docx files into PDFs, working in the Linux command line.

2. LibreOffice

Many methods of document conversion rely on LibreOffice’s ability to read the doc/docx format files. When working in the graphic mode, we can import such a file and then export it as a PDF document. The same can be done in the command line. We use the soffice command to invoke LibreOffice, while for the LibreWtiter itself, we can use lowriter.

If we haven’t LibreOffice preinstalled, let’s install it. On Ubuntu:

$ sudo apt-get install libreoffice

Some commands we’ll talk about require a LibreOffice version higher than 7.2. Let’s make sure that we have one:

$ soffice --version
LibreOffice 24.2.5.2 420(Build:2)

So, we see version 24, which comes directly after version 7.

3. Converting With convert-to

Let’s convert a single doc/docx file with the –convert-to option to soffice:

$ soffice --convert-to pdf example.docx

By default, using the –convert-to option entails setting the –headless one, so LibreOffice runs without interaction with the graphical environment. In other words, no GUI will appear and we’ll be able to use it in systems without a graphical interface.

We can use wildcards in the input file name to transform multiple files. With this line, we’re going to pass all doc and docx files in the current folder:

$ soffice --convert-to pdf *.doc*

Another way is to provide a space-separated list of files:

$ soffice --convert-to pdf example1.doc example2.docx

Finally, with the –outdir option we can specify the folder for output files. For example, let’s use the pdf_output subfolder of the current directory:

$ soffice --convert-to pdf *.doc* --outdir ./pdf_output

3.1. More Control on Output

When working in LibreOffice GUI we have a bunch of options available during the export of the PDF file:

We can also control the export process from the command line, but it requires LibreOffice in a version higher than 7.3. This allows us to parameterize the writer_pdf_Export filter for PDF. Without specifying the parameters, we’ll use the PDF export settings stored by the LibreOffice Writer:

$ soffice --convert-to pdf:writer_pdf_Export example.docx

Now, let’s set some PDF properties. First, with SelectPdfVersion we can specify its version:

$ soffice --convert-to 'pdf:writer_pdf_Export:{"SelectPdfVersion":{"type":"long","value":"1"}}' example.docx

Here, the version value ‘1’ (one) stands for PDF/A-1b.

To extract the range of pages we can use the PageRange parameter:

$ soffice --convert-to 'pdf:writer_pdf_Export:{"PageRange":{"type":"string","value":"3-4"}}' example.docx

Finally, we can change many properties in one writer_pdf_Export filter. So, let’s extract the first page and secure the result with a password ‘secret’:

$ soffice --convert-to 'pdf:writer_pdf_Export: \
"PageRange":{"type":"string","value":"1-1"}, \
{"EncryptFile":{"type":"boolean","value":"true"}, \
"DocumentOpenPassword":{"type":"string","value":"secret"} \
}' *.doc*

Note the combination of EncryptFile and DocumentOpenPassword paramters.

4. Printing to PDF

The other way to use the LibreOffice suite is to print to a PDF file. First, we need to install a PDF printer:

$ sudo apt-get install cups-pdf

Afterward, we have a printer called ‘PDF’ at our disposal. Next, let’s ask LibreOffice to print the document. The –pt option passes the printer name.

$ soffice --pt PDF *.docx

With this setting, soffice opens the docx file without invoking the GUI and sends it to the PDF printer. Generated PDF files go to the PDF folder in the user’s home directory by default. If we need to change the location of these files we can edit the Out entry in the configuration file:

$ sudo nano /etc/cups/cups-pdf.conf

Then we need to restart the cups service:

$ sudo systemctl restart cups

5. Universal Office Converter

Universal Office Converter is the full name of the unoconv Python script. It’s designed to transform any LibreOffice-supported format into another in a non-interactive way. Let’s install it on Ubuntu:

$ sudo apt install unoconv

Then, we can convert a doc/docx document simply by:

$ unoconv example.doc

As PDF is a default output format, we’ll obtain a PDF file of the same name. Next, to change its name, we can use the –output option:

$ unoconv --output pdf_file example.doc

The command accepts multiple files and wildcards:

$ unoconv *.doc

The –output option has a different meaning if we convert multiple files. It indicates the directory for the output files to be stored:

$ unoconv --output pdf_files *.doc

In this case, we’ll find all PDF files in the pdf_files subfolder of the current directory.

6. The pandoc Converter

Now let’s look at something completely different. pandoc is a markup language converter that doesn’t depend on LibreOffice suite. We can use it to transform a docx file (but not an old doc) into a PDF, with the help of an external PDF engine. First, let’s install pandoc:

$ sudo apt install pandoc

By default, pandoc uses pdflatex to create PDF, so we need to install LaTeX to make it work, together with ‘recommended’ and ‘extras’ LaTeX packages:

$ sudo apt install texlive-latex-base texlive-latex-recommended texlive-latex-extras

Afterward, we can use advanced page layouts, graphics, and math in LaTeX. To convert our docx file let’s issue:

$ pandoc -o output.pdf -f docx example.docx

pandoc supports many input files but eventually merges them into one PDF output file. Therefore, we rather should convert individual files in a loop:

$ for file in *.docx; do pandoc -o "$(basename "$file" .docx)".pdf -f docx "$file"; done

We need basename to get rid of the docx extension in the file name.

7. Conclusion

In this tutorial, we discussed converting doc or docx documents to PDF from the Linux command line. First, we discussed a bunch of methods that utilized the LibreOffice suite. First, we used soffice directly or through unoconv to convert doc/docx documents into PDF. Then, we asked LibreOffice to read the doc file and send it to the special PDF printer. The latter took all the burden of creating the PDF file.

As the alternative approach, we checked pandoc, a markup language convertor, to do this job with the help of LaTeX.