1. Introduction
In this article we will be comparing Java XML libraries and APIs.
This is the second article from the series about Java support for XML, if you want to go deeper into the XPath support in Java have a look at the previous article.
2. Overview
Now we’re going to dig deeper into the XML world support and for that we’re going to start by explaining as simple as possible all the subject-related initials.
In Java XML support we can find few API definitions, each one has its pros and cons.
• SAX: It is an event based parsing API, it provides a low level access, is memory efficient and faster than DOM since it doesn’t load the whole document tree in memory but it doesn’t provide support for navigation like the one provided by XPath, although it is more efficient it is harder to use too.
• DOM: It as model based parser that loads a tree structure document in memory, so we have the original elements order, we can navigate our document both directions, it provides an API for reading and writing, it offers XML manipulation and it is very easy to use although the price is high strain on memory resources.
• StAX: It offers the ease of DOM and the efficiency of SAX but it lacks of some functionality provided by DOM like XML manipulation and it only allows us to navigate the document forward.
• JAXB: It allows us to navigate the document in both directions, it is more efficient than DOM, it allows conversion from XML to java types and it supports XML manipulation but it can only parse a valid XML document.
You could still find some references to JAXP but last release of this project is from March 2013 and it is practically dead.
3. The XML
In this section we are going to see the most popular implementations, so that we can test real working samples and check differences between them.
In the following examples we will be working with a simple XML file with a structure like this:
<tutorials>
<tutorial tutId="01" type="java">
<title>Guava</title>
<description>Introduction to Guava</description>
<date>04/04/2016</date>
<author>GuavaAuthor</author>
</tutorial>
...
</tutorials>
4. DOM4J
We’re going to start by taking a look at what we can do with DOM4J and for this example we need to add the last version of this dependency.
This is one of the most popular libraries to work with XML files, since it allows us to perform bi-directional reading, create new documents and update existing ones.
DOM4J can work with DOM, SAX, XPath and XLST. SAX is supported via JAXP.
Let’s take a look here for example, how can we select an element filtering by a given id.
SAXReader reader = new SAXReader();
Document document = reader.read(file);
List<Node> elements = document.selectNodes("//*[@tutId='" + id + "']");
return elements.get(0);
The SAXReader class is responsible for creating a DOM4J tree from SAX parsing events. Once we have a org.dom4j.Document we just need to call the necessary method and pass to it the XPath expression as a String.
We can load an existing document, make changes to its content and then update the original file.
for (Node node : nodes) {
Element element = (Element)node;
Iterator<Element> iterator = element.elementIterator("title");
while (iterator.hasNext()) {
Element title =(Element)iterator.next();
title.setText(title.getText() + " updated");
}
}
XMLWriter writer = new XMLWriter(
new FileWriter(new File("src/test/resources/example_updated.xml")));
writer.write(document);
writer.close();
In the example above, we are changing every title’s content and create a new file.
Notice here how simple it is to get every title’s node in a list by calling elementIterator and passing the name of the node.
Once we have our content modified, we will use the XMLWriter that takes a DOM4J tree and formats it to a stream as XML.
Creating a new document from the scratch is as simple as we see below.
Document document = DocumentHelper.createDocument();
Element root = document.addElement("XMLTutorials");
Element tutorialElement = root.addElement("tutorial").addAttribute("tutId", "01");
tutorialElement.addAttribute("type", "xml");
tutorialElement.addElement("title").addText("XML with Dom4J");
...
OutputFormat format = OutputFormat.createPrettyPrint();
XMLWriter writer = new XMLWriter(
new FileWriter(new File("src/test/resources/example_new.xml")), format);
writer.write(document);
writer.close();
DocumentHelper gives us a collection of methods to use by DOM4J, such as createDocument that creates an empty document to start working with it.
We can create as many attributes or elements as we need with the methods provided by DOM4J, and once we have our document completed we just write it to a file as we did with the update case before.
5. JDOM
In order to work with JDOM, we have to add this dependency to our pom.
JDOM’s working style is pretty similar to DOM4J’s, so we are going to take a look at just a couple of examples:
SAXBuilder builder = new SAXBuilder();
Document doc = builder.build(this.getFile());
Element tutorials = doc.getRootElement();
List<Element> titles = tutorials.getChildren("tutorial");
In the example above, we are retrieving all elements from the root element in a very simple way as we can do with *DOM4J:
*
SAXBuilder builder = new SAXBuilder();
Document document = (Document) builder.build(file);
String filter = "//*[@tutId='" + id + "']";
XPathFactory xFactory = XPathFactory.instance();
XPathExpression<Element> expr = xFactory.compile(filter, Filters.element());
List<Element> node = expr.evaluate(document);
Again, here in the code above, we have a SAXBuilder creating a Document instance from a given file. We are retrieving an element by its tutId attribute by passing an XPath expression to the XPathFactory provided by JDOM2.
6. StAX
Now, we are going to see how we could retrieve all elements from our root element using the Stax API. Stax is included in the JDK since Java 6 so you don’t need to add any dependencies.
Firstly, we need to create a Tutorial class:
public class Tutorial {
private String tutId;
private String type;
private String title;
private String description;
private String date;
private String author;
// standard getters and setters
}
and then we are ready to follow with:
List<Tutorial> tutorials = new ArrayList<>();
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLEventReader eventReader = factory.createXMLEventReader(new FileReader(this.getFile()));
Tutorial current;
while (eventReader.hasNext()) {
XMLEvent event = eventReader.nextEvent();
switch (event.getEventType()) {
case XMLStreamConstants.START_ELEMENT:
StartElement startElement = event.asStartElement();
String qName = startElement.getName().getLocalPart();
...
break;
case XMLStreamConstants.CHARACTERS:
Characters characters = event.asCharacters();
...
break;
case XMLStreamConstants.END_ELEMENT:
EndElement endElement = event.asEndElement();
// check if we found the closing element
// close resources that need to be explicitly closed
break;
}
}
In the example above, in order to help us retrieve the information, we needed to create a class to store the retrieved data in.
To read the document, we declared what is called event handlers and we used them to navigate our document ahead. Remember that the SAX implementations don’t provide bi-directional navigation. As you can see here, a lot of work needs to be done just to retrieve a simple list of elements.
7. JAXB
JAXB is included with the JDK, as well as Xerces, se don’t need any extra dependency for this one.
It’s very simple to load, create and manipulate information from an XML file using JAXB.
We just need to create the correct java entities to bind the XML and that’s it.
JAXBContext jaxbContext = JAXBContext.newInstance(Tutorials.class);
Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller();
Tutorials tutorials = (Tutorials) jaxbUnmarshaller.unmarshal(this.getFile());
In the example above, we load our XML file into our object and from there we can handle everything as a normal Java structure;
To create a new document, it is as simple as reading it but doing the reverse way, like done in the below code.
Firstly, we are going to modify our Tutorial class to add JAXB annotations to getters and setters:
public class Tutorial {
...
public String getTutId() {
return tutId;
}
@XmlAttribute
public void setTutId(String tutId) {
this.tutId = tutId;
}
...
@XmlElement
public void setTitle(String title) {
this.title = title;
}
...
}
@XmlRootElement
public class Tutorials {
private List<Tutorial> tutorial;
// standard getters and setters with @XmlElement annotation
}
With @XmlRootElement we define what object is going to represent the root node of our document and then we use @XmlAttribute or @XmlElement to define whether that attribute represents an attribute of a node or an element of the document.
Then we can follow with:
Tutorials tutorials = new Tutorials();
tutorials.setTutorial(new ArrayList<>());
Tutorial tut = new Tutorial();
tut.setTutId("01");
...
tutorials.getTutorial().add(tut);
JAXBContext jaxbContext = JAXBContext.newInstance(Tutorials.class);
Marshaller jaxbMarshaller = jaxbContext.createMarshaller();
jaxbMarshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
jaxbMarshaller.marshal(tutorials, file);
As you can see, binding XML file to Java objects is the easiest way to work this kind of files.
8. XPath Expression Support
To create complex XPath expressions, we can use Jaxen. This is an open source XPath library adaptable to many different object models, including DOM, XOM, DOM4J, and JDOM.
We can create XPath expressions and compile them against many supported documents.
String expression = "/tutorials/tutorial";
XPath path = new DOMXPath(expression);
List result = path.selectNodes(xmlDocument);
To make it work we’ll need to add this dependency to our project.
9. Conclusion
As you can see there are many options for working with XML, depending on the requirements of your application, you could work with any of them or you may have to choose between efficiency and simplicity.
You can find the full working samples for this article in our git repository here.