1. Overview
When working with String values in Java, there are times when we need to clean up our data by removing specific characters. One common scenario is removing bracket characters. With the right approach, removing these characters can be straightforward.
In this tutorial, we’ll explore how to achieve this.
2. Introduction to the Problem
First, let’s make the requirement clear: what are bracket characters?
If we focus on ASCII characters, there are three pairs of bracket characters:
- Parentheses/round brackets – ‘(‘ and ‘)’
- Square brackets – ‘[‘ and ‘]’
- Curly brackets – ‘{‘ and ‘}’
Apart from these three pairs, we often use ‘<‘ and ‘>’ as angle brackets in practice, such as in XML tags.
However, ‘<‘ and ‘>’ actually aren’t bracket characters. They’re defined as “less than” and “greater than” characters. But we’ll treat them as the fourth pair of bracket characters, as they’re often used as angle brackets.
Therefore, we aim to remove the four pairs of characters from a given String.
Let’s say we have a String value:
static final String INPUT = "This (is) <a> [nice] {string}!";
As we can see, the INPUT String contains all eight bracket characters. After removing all bracket characters, we expect to get this result:
"This is a nice string!"
Of course, our input may contain Unicode characters. This tutorial also addresses the Unicode String scenario.
Next, let’s take INPUT as an example and see how to remove characters.
3. Using the StringUtils.replaceChars() Method
Apache Commons Lang 3 is a widely used library. The StringUtils class from this library provides a rich set of helper methods that allow us to manipulate strings conveniently.
For example, we can solve our problem using the replaceChars() method. This method allows us to replace multiple characters in one go. Further, we can employ it to delete characters:
String result = StringUtils.replaceChars(INPUT, "(){}[]<>", null);
assertEquals("This is a nice string!", result);
As the code above shows, we pass the String “(){}[]<>” as the searchChars argument and a null value as the replaceChars argument. This is because when replaceChars is null, replaceChars() deletes all characters contained in searchChars from the input String. Therefore, replaceChars() does the job.
4. Using the Regex-Based replaceAll() Method
Regular expressions (regex) are powerful tools for matching patterns within strings, allowing us to efficiently search, replace, and manipulate text based on defined criteria.
Next, let’s see how to remove bracket characters using the regex-based replaceAll() method from the Java standard library:
String regex = "[(){}<>\\[\\]]";
String result = INPUT.replaceAll(regex, "");
assertEquals("This is a nice string!", result);
The regex pattern looks pretty straightforward. It has only one character class, which includes the bracket characters.
Sharp eyes might have noticed that we only escaped the ‘*[‘ and ‘]‘ characters in the character class while leaving ‘(){}<>*‘ as they are. This is because regex matches characters in a character class literally, meaning all characters within a character class lose their special meanings and don’t need to be escaped.
However, since ‘*[‘ and ‘]*‘ are used to define the character class itself, we must escape them to distinguish between their roles as delimiters of the character class and as literal characters within the class.
5. Removing Unicode Bracket Characters
We’ve seen how to delete bracket characters from a String input that includes only ASCII characters. Next, let’s see how to remove Unicode bracket characters.
Let’s say we have another String input containing Unicode and ASCII bracket characters:
static final String INPUT_WITH_UNICODE = "⟨T⟩❰h❱「i」⦇s⦈ (is) <a> [nice] {string}!";
As the example shows, apart from ASCII bracket characters “*(){}[]<>*” it contains the following Unicode characters:
- ⟨ and ⟩ – mathematical angle brackets U27E8 and U27E9
- ❰ and ❱ – heavy angle brackets U2770 and U2771
- 「 and 」– corner brackets U300C and U300D
- ⦇ and ⦈ – image brackets U2987 and U2988
There are still many more Unicode bracket characters that our example doesn’r cover. Fortunately, regex supports Unicode category matching.
We can use \p{Ps} and \p{Pe} to match all opening and closing bracket characters.
Next, let’s see if these categories can tell replaceAll() to delete all bracket characters:
String regex = "\\p{Ps}|\\p{Pe}";
String result = INPUT.replaceAll(regex, "");
assertEquals("This is <a> nice string!", result);
String resultWithUnicode = INPUT_WITH_UNICODE.replaceAll(regex, "");
assertEquals("This is <a> nice string!", resultWithUnicode);
The test above shows most character brackets have been removed. However, the ASCII characters ‘*<*‘ and ‘*>*‘ remain. This is because ‘*<*‘ and ‘*>*‘ are defined as “less than” and “greater than” rather than angle brackets. That is to say, they don’t belong to the bracket category and aren’t matched by the regex.
If we want to remove ‘*<*‘ and ‘*>*‘, *we can add the character class “[<>]” to the pattern*:
String regex = "\\p{Ps}|\\p{Pe}|[<>]";
String result = INPUT.replaceAll(regex, "");
assertEquals("This is a nice string!", result);
String resultWithUnicode = INPUT_WITH_UNICODE.replaceAll(regex, "");
assertEquals("This is a nice string!", resultWithUnicode);
As we can see, this time, we got the expected result.
6. Conclusion
In this article, we’ve explored different ways to remove bracket characters from an input String and discussed how to remove Unicode brackets through an example.
As always, the complete source code for the examples is available over on GitHub.