1. Overview

XML is a human-readable markup language. However, if it’s not well-formatted, it isn’t easy to read or understand. For example, an XML file containing a single long line or XML without element indentations is difficult to visually comprehend. This is especially true when we want to display it in the Linux console.

In this tutorial, we’ll address several ways to pretty-print an XML file using Linux commands.

2. Our XML Example

First of all, let’s take a look at an XML file, emails.xml, that we’ll use in our examples:

<emails> <email> <from>Kai</from> <to>Amanda</to> <time>2018-03-05</time>
<subject>I am flying to you</subject></email> <email>
<from>Jerry</from> <to>Tom</to> <time>1992-08-08</time> <subject>Hey Tom, catch me if you can!</subject>
</email> </emails>

The emails.xml file is a valid XML file. However, since it’s not well-formatted, it’s tough to read and understand.

We’ll take this file as an input example, and pretty-print it in the command line.

There are many ways to format and output an XML file. In this tutorial, we’re going to address three command-line XML utilities: xmllint, XMLStarlet, and xml_pp.

Now, let’s print the emails.xml in a human-readable format.

3. Using the xmllint Command

The xmllint command is a member of the xmllib2 package. Usually, we can use it to check if XML files are valid, parse XML files, or evaluate XPath expressions.

3.1. Pretty-Print XML

The xmllint utility has the –format option. With this option, we can reformat and reindent the XML. The syntax is straightforward:

xmllint --format XML_FILE

Let’s reformat our emails.xml using the xmllint command:

$ xmllint --format emails.xml

We get the output:

<?xml version="1.0"?>
<emails>
  <email>
    <from>Kai</from>
    <to>Amanda</to>
    <time>2018-03-05</time>
    <subject>I am flying to you</subject>
  </email>
  <email>
    <from>Jerry</from>
    <to>Tom</to>
    <time>1992-08-08</time>
    <subject>Hey Tom, catch me if you can!</subject>
  </email>
</emails>

Now, the data in XML is much easier to read and understand.

We also see the command adds the XML declaration — even though we don’t have it in our input file.

3.2. Format Options

We can easily reformat XML files using the xmllint command together with the –format option. The default indent is two spaces. However, we can change it by setting the XMLLINT_INDENT environment variable. 

Let’s reformat and print the emails.xml again. This time, let’s set four spaces as the indent:

$ XMLLINT_INDENT="    " ; xmllint --format emails.xml

The output of the command is:

<?xml version="1.0"?>
<emails>
    <email>
        <from>Kai</from>
        <to>Amanda</to>
        <time>2018-03-05</time>
        <subject>I am flying to you</subject>
    </email>
    <email>
        ...
    </email>
</emails>

4. Using the XMLStarlet Toolkit

XMLStarlet is a command-line XML toolkit. It contains one executable called xml. Using this command, we can transform, query, validate, and edit XML documents and files.

Let’s take a look at the syntax for using the xml command:

xml [<options>] <command> [<cmd-options>]

4.1. Pretty-Print XML

We can use the format command (or the short form, fo) to reformat an XML file:

$ xml format emails.xml

It outputs:

<?xml version="1.0"?>
<emails>
  <email>
    <from>Kai</from>
    <to>Amanda</to>
    <time>2018-03-05</time>
    <subject>I am flying to you</subject>
    <
  </email>
  <email>
    <from>Jerry</from>
    <to>Tom</to>
    <time>1992-08-08</time>
    <subject>Hey Tom, catch me if you can!</subject>
  </email>
</emails>

As the output above shows, our emails.xml is pretty-printed. Same as the xmllint command, the default indentation is two space characters.

Similar to xmllint, we also see the command adds the XML declaration if missing from our input.

Next, let’s have a look at what format options the xml command provides.

4.2. Format Options

The xml format command has four options to control the output:

  • -n or –noindent: do not indent the output
  • -t or –indent-tab: indent output with TABs
  • -s or –indent-spaces : indent output with spaces
  • o or –omit-decl: omit xml declaration

Let’s launch the xml format command with our emails.xml file again, and this time, we want to indent the output with eight spaces and omit the XML declaration:

$ xml fo -o -s 8 emails.xml

It outputs:

<emails>
        <email>
                <from>Kai</from>
                <to>Amanda</to>
                <time>2018-03-05</time>
                <subject>I am flying to you</subject>
        </email>
        <email>
                ...
        </email>
</emails>

5. Using the xml_pp Command

The xml_pp command is shipped with the Perl module XML::Twig. The name xml_pp stands for “XML Pretty-Printer”.

5.1. Pretty-Print XML

As its name tells, the xml_pp is born to print XML documents in a pretty format. The syntax to use it is straightforward:

xml_pp [options] XML_FILES

Let’s see if it can pretty-print our emails.xml:

$ xml_pp emails.xml

The command prints:

<emails>
  <email>
    <from>Kai</from>
    <to>Amanda</to>
    <time>2018-03-05</time>
    <subject>I am flying to you</subject>
  </email>
  <email>
    <from>Jerry</from>
    <to>Tom</to>
    <time>1992-08-08</time>
    <subject>Hey Tom, catch me if you can!</subject>
  </email>
</emails>

The output shows the indentation is two space characters here as well.

Also, if we look at the beginning of the output, the XML declaration is not added by default if our input doesn’t have one.

5.2. Output Options

We cannot set the indentation like we did with the xml format and xmllint commands. Also, the xml_pp command doesn’t provide an option for the user to change the indentation.

The xml_pp utility supports options to control the output in other aspects, such as:

  • -e : Set the output encoding
  • -p : Preserve whitespaces in elements
  • -s