1. Overview

In this quick tutorial, we’ll focus on the substring functionality of Strings in Java.

We’ll mostly use the methods from the String class and few from Apache Commons’ StringUtils class.

In all of the following examples, we’re going to using this simple String:

String text = "Julia Evans was born on 25-09-1984. "
  + "She is currently living in the USA (United States of America).";

2. Basics of substring

Let’s start with a very simple example here – extracting a substring with the start index:

assertEquals("USA (United States of America).", 

Note how we extracted Julia’s country of residence in our example here.

There’s also an option to specify an end index, but without it – substring will go all the way to the end of the String.

Let’s do that and get rid of that extra dot at the end, in the example above:

assertEquals("USA (United States of America)", 
  text.substring(67, text.length() - 1));

In the examples above, we’ve used the exact position to extract the substring.

2.1. Getting a Substring Starting at a Specific Character

In case the position needs to be dynamically calculated based on a character or String we can make use of the indexOf method:

assertEquals("United States of America", 
  text.substring(text.indexOf('(') + 1, text.indexOf(')')));

A similar method that can help us locate our substring is lastIndexOf. Let’s use lastIndexOf to extract the year “1984”. Its the portion of text between the last dash and the first dot:

  text.substring(text.lastIndexOf('-') + 1, text.indexOf('.')));

Both indexOf and lastIndexOf can take a character or a String as a parameter. Let’s extract the text “USA” and the rest of the text in the parenthesis:

assertEquals("USA (United States of America)",
  text.substring(text.indexOf("USA"), text.indexOf(')') + 1));

3. Using subSequence

The String class provides another method called subSequence which acts similar to the substring method.

The only difference is that it returns a CharSequence instead of a String and it can only be used with a specific start and end index:

assertEquals("USA (United States of America)", 
  text.subSequence(67, text.length() - 1));

4. Using Regular Expressions

Regular expressions will come to our rescue if we have to extract a substring that matches a specific pattern.

In the example String, Julia’s date of birth is in the format “dd-mm-yyyy”. We can match this pattern using the Java regular expression API.

First of all, we need to create a pattern for “dd-mm-yyyy”:

Pattern pattern = Pattern.compile("\\d{2}-\\d{2}-\\d{4}");

Then, we’ll apply the pattern to find a match from the given text:

Matcher matcher = pattern.matcher(text);

Upon a successful match we can extract the matched String:

if (matcher.find()) {                                  
    Assert.assertEquals("25-09-1984", matcher.group());

For more details on the Java regular expressions check out this tutorial.

5. Using split

We can use the split method from the String class to extract a substring. Say we want to extract the first sentence from the example String. This is quite easy to do using split:

String[] sentences = text.split("\\.");

Since the split method accepts a regex we had to escape the period character. Now the result is an array of 2 sentences.

We can use the first sentence (or iterate through the whole array):

assertEquals("Julia Evans was born on 25-09-1984", sentences[0]);

Please note that there are better ways for sentence detection and tokenization using Apache OpenNLP. Check out this tutorial to learn more about the OpenNLP API.

6. Using Scanner

We generally use Scanner to parse primitive types and Strings using regular expressions. A Scanner breaks its input into tokens using a delimiter pattern, which by default matches whitespace.

Let’s find out how to use this to get the first sentence from the example text:

try (Scanner scanner = new Scanner(text)) {
    assertEquals("Julia Evans was born on 25-09-1984", scanner.next());    

In the above example, we have set the example String as the source for the scanner to use.

Then we are setting the period character as the delimiter (which needs to be escaped otherwise it will be treated as the special regular expression character in this context).

Finally, we assert the first token from this delimited output.

If required, we can iterate through the complete collection of tokens using a while loop.

while (scanner.hasNext()) {
   // do something with the tokens returned by scanner.next()

7. Maven Dependencies

We can go a bit further and use a useful utility – the StringUtils class – part of the Apache Commons Lang library:


You can find the latest version of this library here.

8. Using StringUtils

The Apache Commons libraries add some useful methods for manipulating core Java types. Apache Commons Lang provides a host of helper utilities for the java.lang API, most notably String manipulation methods.

In this example, we’re going to see how to extract a substring nested between two Strings:

assertEquals("United States of America", 
  StringUtils.substringBetween(text, "(", ")"));

There is a simplified version of this method in case the substring is nested in between two instances of the same String:

substringBetween(String str, String tag)

The substringAfter method from the same class gets the substring after the first occurrence of a separator.

The separator isn’t returned:

assertEquals("the USA (United States of America).", 
  StringUtils.substringAfter(text, "living in "));

Similarly, the substringBefore method gets the substring before the first occurrence of a separator.

The separator isn’t returned:

assertEquals("Julia Evans", 
  StringUtils.substringBefore(text, " was born"));

You can check out this tutorial to find out more about String processing using Apache Commons Lang API.

9. Conclusion

In this quick article, we found out various ways to extract a substring from a String in Java. You can explore our other tutorials on String manipulation in Java.

As always, code snippets can be found over on GitHub.