1. Overview
Server-side applications sometimes require parsing of HTML characters. This is where the process of escaping/unescaping comes in helpful. In this tutorial, we’ll demonstrate a couple of ways to unescape HTML characters in Java. We’ll look into some of the available libraries that can handle this task.
2. Ways to Unescape HTML Character
Processing HTML symbols within the JVM can be tricky because these symbols in Java strings represent more than two or more characters. For example, < or > characters in HTML are represented as < and > Strings in Java. For the JVM to interpret these characters correctly, we need ways to escape or unescape them.
We can always write code ourselves to handle unescaping HTML characters, but the process would take time. It is also prone to errors. Instead, we can work with available Java libraries that can handle the task. The sample code in the subsections below tests the conversion of escaped HTML symbols to “unescaped” characters using various Java libraries.
2.1. Unescaping via Apache Commons’ StringEscapeUtils
Apache Commons is a popular Java library with the goal and focus on reusing components. Its StringEscapeUtils class has many convenient methods and among them is StringEscapeUtils.unescapeHtml4() within the package org.apache.commons.text:
String expectedQuote = "\"Hello\" Baeldung";
String escapedQuote = ""Hello" Baeldung";
Assert.assertEquals(expectedQuote, StringEscapeUtils.unescapeHtml4(escapedQuote));
String escapedStringsWithHtmlSymbol = "<p><strong>Test sentence in bold type.</strong></p>";
String expectedStringsWithHtmlSymbol = "<p><strong>Test sentence in bold type.</strong></p>";
Assert.assertEquals(expectedStringsWithHtmlSymbol, StringEscapeUtils.unescapeHtml4(escapedStringsWithHtmlSymbol));
2.2. Unescaping via Spring Framework’s HtmlUtils
Spring Framework is a solid Java platform that has various infrastructure support for developing applications. It provides the HtmlUtils.htmlUnescape() function which is used to convert escaped HTML characters. It can be found under the package org.springframework.web.util:
String expectedQuote = "\"Code smells\" -Martin Fowler";
String escapedQuote = ""Code smells" -Martin Fowler";
Assert.assertEquals(expectedQuote, HtmlUtils.htmlUnescape(escapedQuote));
String escapedStringsWithHtmlSymbol = "<p>Loren Ipsum is a popular paragraph.</p>";
String expectedStringsWithHtmlSymbol = "<p>Loren Ipsum is a popular paragraph.</p>";
Assert.assertEquals(expectedStringsWithHtmlSymbol, HtmlUtils.htmlUnescape(escapedStringsWithHtmlSymbol));
2.3. Unescaping via Unbescape’s HtmlEscape
The Unbescape library is for escaping and unescaping many formats of data, among them HTML, JSON, and CSS. The example below shows unescaping HTML characters and tags via the HtmlEscape.unescapeHtml() method:
String expectedQuote = "\"Carpe diem\" -Horace";
String escapedQuote = ""Carpe diem" -Horace";
Assert.assertEquals(expectedQuote, HtmlEscape.unescapeHtml(escapedQuote));
String escapedStringsWithHtmlSymbol = "<p><em>Pizza is a famous Italian food. Duh.</em></p>";
String expectedStringsWithHtmlSymbol = "<p><em>Pizza is a famous Italian food. Duh.</em></p>";
Assert.assertEquals(expectedStringsWithHtmlSymbol, HtmlEscape.unescapeHtml(escapedStringsWithHtmlSymbol));
2.4. Unescaping via Jsoup’s Entities.unescape()
The Jsoup library is for all sorts of HTML manipulation. Its Java HTML Parser offers a wide range of support for HTML and XML requirements. Entities.unescape() is a function whose main goal is to unescape Strings with escaped HTML characters:
String expectedQuote = "\"Jsoup\" is another strong library";
String escapedQuote = ""Jsoup" is another strong library";
Assert.assertEquals(expectedQuote, Entities.unescape(escapedQuote));
String escapedStringsWithHtmlSymbol = "<p>It simplifies working with real-world <strong>HTML</strong> and <strong>XML</strong></p>";
String expectedStringsWithHtmlSymbol = "<p>It simplifies working with real-world <strong>HTML</strong> and <strong>XML</strong></p>";
Assert.assertEquals(expectedStringsWithHtmlSymbol, Entities.unescape(escapedStringsWithHtmlSymbol));
3. Conclusion
In this tutorial, we’ve demonstrated ways how to unescape HTML characters by using various libraries available to the Java community. Libraries like Apache Commons and Spring Framework are popular libraries that offer various tools for processing HTML entities. Unbescape and Jsoup offer processing capabilities not just for HTML characters, but also other forms of data formats. They help validate inputs from the server side whenever an application requires it.
All code samples used in the article are available over on GitHub.