1. Overview
Portable Document Format or PDF is a widely used file format. It’s an open standard maintained by ISO (International Organization for Standardization). It provides a way for exchanging documents that’s independent of software, hardware and operating systems being used.
Besides the content, a PDF file may also contain general information, called metadata, about the document. Sometimes, we may need to inspect this information.
In this tutorial, we’ll discuss how to check the metadata within a PDF document from the command line.
2. PDF Metadata
A PDF document consists of a set of PDF objects. The document information dictionary is one of them.
It contains the metadata information about the file. This dictionary stores the metadata information as key-value pairs.
The dictionary contains the following values:
- Title
- Subject
- Keywords
- Author
- Creator
- Producer
- Creation Date
- Modification Date
It’s optional to supply values for the keys in the document information dictionary. It’s also possible to add, delete and edit the metadata in a PDF document.
3. Using pdfinfo
We can use the pdfinfo command to print the contents of the document information dictionary of a PDF document. Besides the contents of the dictionary, it also prints other useful information such as page count, page size, PDF version.
Now, we’ll use pdfinfo to examine a PDF document, example.pdf:
$ pdfinfo example.pdf
Title: Introduction to Programming Languages
Keywords: "C++,Java,Python"
Creator: FrameMaker 8.0
Producer: Acrobat Distiller 7.0.5 (Windows)
CreationDate: Mon Feb 4 10:16:29 2013 EST
ModDate: Mon Feb 4 18:00:11 2013 EST
Tagged: yes
UserProperties: no
Suspects: no
Form: none
JavaScript: no
Pages: 58
Encrypted: no
Page size: 596 x 792 pts
Page rot: 0
File size: 544855 bytes
Optimized: no
PDF version: 1.4
Here, the keys and the corresponding values in the output are self-explanatory.
The page size and page rotation in the output of pdfinfo belongs to the first page. pdfinfo just prints the information of the first page by default if it’s called without any options.
If we want to see the page information of other pages, we can use the -f and -l options:
$ pdfinfo –f 2 –l 3 example.pdf
Title: Introduction to Programming Languages
Keywords: "C++,Java,Python"
Creator: FrameMaker 8.0
Producer: Acrobat Distiller 7.0.5 (Windows)
CreationDate: Mon Feb 4 10:16:29 2013 EST
ModDate: Mon Feb 4 18:00:11 2013 EST
Tagged: yes
UserProperties: no
Suspects: no
Form: none
JavaScript: no
Pages: 58
Encrypted: no
Page 2 size: 596 x 792 pts
Page 2 rot: 0
Page 3 size: 596 x 792 pts
Page 3 rot: 0
File size: 544855 bytes
Optimized: no
PDF version: 1.4
The -f option specifies the first page to examine, while the -l option specifies the last page to examine. Since we passed 2 for the -f option and 3 for the -l option as arguments, pdfinfo printed the page information for these pages.
The unit of page size is in pts. This unit corresponds to a PostScript point. It’s approximately 0.3428 mm.
4. Using exiftool
Another tool we can use is exiftool. It’s a free and open-source tool for reading and writing meta information of a wide variety of files.
It supports many image, audio and video file types. It can also be used for PDF documents.
exif in its name stands for Exchangeable Image File Format.
exiftool comes installed with the perl-Image-ExifTool package.
Let’s use exiftool to examine example.pdf:
$ exiftool example.pdf
ExifTool Version Number : 12.42
File Name : example.pdf
Directory : /home/alice/documents
File Size : 545 kB
File Modification Date/Time : 2022:10:07 10:15:34+03:00
File Access Date/Time : 2022:12:01 08:28:07+03:00
File Inode Change Date/Time : 2022:10:07 10:15:34+03:00
File Permissions : -rw-r--r--
File Type : PDF
File Type Extension : pdf
MIME Type : application/pdf
PDF Version : 1.4
Linearized : No
Tagged PDF : Yes
Page Mode : UseOutlines
XMP Toolkit : 3.1-702
Copyright : Example Company
Web Statement : www.example.com
Producer : Acrobat Distiller 7.0.5 (Windows)
Creator Tool : FrameMaker 8.0
Modify Date : 2013:02:04 11:00:11-05:00
Create Date : 2013:02:04 08:16:29Z
Metadata Date : 2013:02:04 11:00:11-05:00
Format : application/pdf
Title : Introduction to Programming Languages
Creator : .
Subject : C++,Java,Python
Document ID : uuid:e5db80e2-4fef-4596-b7a4-bcc15ba1a0da
Instance ID : uuid:da74d5f2-b906-4591-bbe6-ce9b0870ab55
Page Count : 58
Keywords : "C++, Java, Python"
Warning : [Minor] Ignored duplicate Info dictionary
The output of exiftool for example.pdf covers the output of pdfinfo. It also has extra fields such as MIME Type, Document ID, etc.
5. Using file
The file command is useful for determining the type of a file. It’s also possible to use file for inspecting some of the properties of a PDF document.
Let’s use file without specifying any options:
$ file example.pdf
/home/alice/documents/example.pdf: PDF document, version 1.4
The file is a PDF document, as expected. The output also printed the PDF version, which is 1.4.
The -i option of file provides us the MIME type of a file:
$ file –i example.pdf
/home/alice/documents/example.pdf: application/pdf; charset=binary
The MIME type of example.pdf is application/pdf as expected.
6. Conclusion
In this article, we discussed how to check the metadata information of a PDF document from the command line.
Firstly, we learned what PDF metadata is. Then, we discussed the tools we can use for inspecting the PDF metadata.
The first command, pdfinfo, extracts the document information dictionary within a PDF document. It also prints some other additional information.
The second tool we discussed was exiftool, which is a versatile tool used for reading and writing metadata information in a wide variety of files including PDF.
Finally, we saw that the file command gives us the PDF version and MIME type of the document.