1. Overview
In this article we’re going to go over the basics of XPath with the support in the standard Java JDK.
We are going to use a simple XML document, process it and see how to go over the document to extract the information we need from it.
XPath is a standard syntax recommended by the W3C, it is a set of expressions to navigate XML documents. You can find a full XPath reference here.
2. A Simple XPath Parser
import javax.xml.namespace.NamespaceContext;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
public class DefaultParser {
private File file;
public DefaultParser(File file) {
this.file = file;
}
}
Now lets take a closer look to the elements you will find in the DefaultParser:
FileInputStream fileIS = new FileInputStream(this.getFile());
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(fileIS);
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = "/Tutorials/Tutorial";
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
Let’s break that down:
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
We will use this object to produce a DOM object tree from our xml document:
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Having an instance of this class, we can parse XML documents from many different input sources like InputStream, File, URL and SAX:
Document xmlDocument = builder.parse(fileIS);
A Document (org.w3c.dom.Document) represents the entire XML document, is the root of the document tree, provides our first access to data:
XPath xPath = XPathFactory.newInstance().newXPath();
From the XPath object we’ll access the expressions and execute them over our document to extract what we need from it:
xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
We can compile an XPath expression passed as string and define what kind of data we are expecting to receive such a NODESET, NODE or String for example.
3. Lets Start
Now that we took a look to the base components we will use, lets start with some code using some simple XML, for testing purposes:
<?xml version="1.0"?>
<Tutorials>
<Tutorial tutId="01" type="java">
<title>Guava</title>
<description>Introduction to Guava</description>
<date>04/04/2016</date>
<author>GuavaAuthor</author>
</Tutorial>
<Tutorial tutId="02" type="java">
<title>XML</title>
<description>Introduction to XPath</description>
<date>04/05/2016</date>
<author>XMLAuthor</author>
</Tutorial>
</Tutorials>
3.1. Retrieve a Basic List of Elements
The first method is a simple use of an XPath expression to retrieve a list of nodes from the XML:
FileInputStream fileIS = new FileInputStream(this.getFile());
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(fileIS);
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = "/Tutorials/Tutorial";
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
We can retrieve the tutorial list contained in the root node by using the expression above, or by using the expression “*//Tutorial*” but this one will retrieve all
The NodeList it returns by specifying NODESET to the compile instruction as return type, is an ordered collection of nodes that can be accessed by passing an index as parameter.
3.2. Retrieving a Specific Node by Its ID
We can look for an element based on any given id just by filtering:
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(this.getFile());
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = "/Tutorials/Tutorial[@tutId=" + "'" + id + "'" + "]";
node = (Node) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODE);
By using this kind of expressions we can filter for whatever element we need to look for just by using the correct syntax. These kind of expressions are called predicates and they are an easy way to locate specific data over a document, for example:
/Tutorials/Tutorial[1]
/Tutorials/Tutorial[first()]
/Tutorials/Tutorial[position()<4]
You can find a complete reference of predicates here
3.3. Retrieving Nodes by a Specific Tag Name
Now we’re going further by introducing axes, lets see how this works by using it in an XPath expression:
Document xmlDocument = builder.parse(this.getFile());
this.clean(xmlDocument);
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = "//Tutorial[descendant::title[text()=" + "'" + name + "'" + "]]";
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
With the expression used above, we are looking for every