1. Overview

In this tutorial, we’ll discuss the Java Regex API, and how we can use regular expressions in the Java programming language.

In the world of regular expressions, there are many different flavors to choose from, such as grep, Perl, Python, PHP, awk, and much more.

This means that a regular expression that works in one programming language, may not work in another. The regular expression syntax in Java is most similar to that found in Perl.

2. Setup

To use regular expressions in Java, we don’t need any special setup. The JDK contains a special package, java.util.regex, totally dedicated to regex operations. We only need to import it into our code.

Moreover, the java.lang.String class also has inbuilt regex support that we commonly use in our code.

3. Java Regex Package

The java.util.regex package consists of three classes: Pattern, Matcher, and PatternSyntaxException:

  • Pattern object is a compiled regex. The Pattern class provides no public constructors. To create a pattern, we must first invoke one of its public static compile methods, which will then return a Pattern object. These methods accept a regular expression as the first argument.
  • Matcher object interprets the pattern and performs match operations against an input String. It also defines no public constructors. We obtain a Matcher object by invoking the matcher method on a Pattern object.
  • PatternSyntaxException object is an unchecked exception that indicates a syntax error in a regular expression pattern.

We’ll explore these classes in detail; however, we must first understand how to construct a regex in Java.

If we’re already familiar with regex from a different environment, we may find certain differences, but they’re minimal.

4. Simple Example

Let’s start with the simplest use case for a regex. As we noted earlier, when we apply a regex to a String, it may match zero or more times.

The most basic form of pattern matching supported by the java.util.regex API is the match of a String literal. For example, if the regular expression is foo and the input String is foo, the match will succeed because the Strings are identical:

@Test
public void givenText_whenSimpleRegexMatches_thenCorrect() {
    Pattern pattern = Pattern.compile("foo");
    Matcher matcher = pattern.matcher("foo");
 
    assertTrue(matcher.find());
}

We’ll first create a Pattern object by calling its static compile method and passing it a pattern we want to use.

Then we’ll create a Matcher object be calling the Pattern object’s matcher method and passing it the text we want to check for matches.

Finally, we’ll call the method find in the Matcher object.

The find method keeps advancing through the input text and returns true for every match, so we can use it to find the match count as well:

@Test
public void givenText_whenSimpleRegexMatchesTwice_thenCorrect() {
    Pattern pattern = Pattern.compile("foo");
    Matcher matcher = pattern.matcher("foofoo");
    int matches = 0;
    while (matcher.find()) {
        matches++;
    }
 
    assertEquals(matches, 2);
}

Since we’ll be running more tests, we can abstract the logic for finding the number of matches in a method called runTest:

public static int runTest(String regex, String text) {
    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(text);
    int matches = 0;
    while (matcher.find()) {
        matches++;
    }
    return matches;
}

When we get 0 matches, the test should fail; otherwise, it should pass.

5. Meta Characters

Meta characters affect the way a pattern is matched; in a way, they add logic to the search pattern. The Java API supports several meta characters, the most straightforward being the dot “.”, which matches any character:

@Test
public void givenText_whenMatchesWithDotMetach_thenCorrect() {
    int matches = runTest(".", "foo");
    
    assertTrue(matches > 0);
}

Let’s consider the previous example, where the regex foo matched the text foo, as well as foofoo, two times. If we use the dot meta character in the regex, we won’t get two matches in the second case:

@Test
public void givenRepeatedText_whenMatchesOnceWithDotMetach_thenCorrect() {
    int matches= runTest("foo.", "foofoo");
 
    assertEquals(matches, 1);
}

Notice the dot after the foo in the regex. The matcher matches every text that’s preceded by foo, since the last dot part means any character after. So after finding the first foo, the rest is seen as any character. That’s why there’s only a single match.

The API supports several other meta characters, <([{\^-=$!|]})?*+.>, which we’ll explore further in this article.

6. Character Classes

Browsing through the official Pattern class specification, we’ll discover summaries of supported regex constructs. Under character classes, we have about 6 constructs.

6.1. OR Class

We construct this as [abc]. This matches any of the elements in the set:

@Test
public void givenORSet_whenMatchesAny_thenCorrect() {
    int matches = runTest("[abc]", "b");
 
    assertEquals(matches, 1);
}

If they all appear in the text, it’ll match each element separately with no regard to the order:

@Test
public void givenORSet_whenMatchesAnyAndAll_thenCorrect() {
    int matches = runTest("[abc]", "cab");
 
    assertEquals(matches, 3);
}

They can also be alternated as part of a String. In the following example, when we create different words by alternating the first letter with each element of the set, they’re all matched:

@Test
public void givenORSet_whenMatchesAllCombinations_thenCorrect() {
    int matches = runTest("[bcr]at", "bat cat rat");
 
    assertEquals(matches, 3);
}

6.2. NOR Class

The above set is negated by adding a caret as the first element:

@Test
public void givenNORSet_whenMatchesNon_thenCorrect() {
    int matches = runTest("[^abc]", "g");
 
    assertTrue(matches > 0);
}

Here’s another case:

@Test
public void givenNORSet_whenMatchesAllExceptElements_thenCorrect() {
    int matches = runTest("[^bcr]at", "sat mat eat");
 
    assertTrue(matches > 0);
}

6.3. Range Class

We can define a class that specifies the range that the matched text should fall within by using a hyphen(-). Likewise, we can also negate a range.

Matching uppercase letters:

@Test
public void givenUpperCaseRange_whenMatchesUpperCase_
  thenCorrect() {
    int matches = runTest(
      "[A-Z]", "Two Uppercase alphabets 34 overall");
 
    assertEquals(matches, 2);
}

Matching lowercase letters:

@Test
public void givenLowerCaseRange_whenMatchesLowerCase_
  thenCorrect() {
    int matches = runTest(
      "[a-z]", "Two Uppercase alphabets 34 overall");
 
    assertEquals(matches, 26);
}

Matching both upper case and lower case letters:

@Test
public void givenBothLowerAndUpperCaseRange_
  whenMatchesAllLetters_thenCorrect() {
    int matches = runTest(
      "[a-zA-Z]", "Two Uppercase alphabets 34 overall");
 
    assertEquals(matches, 28);
}

Matching a given range of numbers:

@Test
public void givenNumberRange_whenMatchesAccurately_
  thenCorrect() {
    int matches = runTest(
      "[1-5]", "Two Uppercase alphabets 34 overall");
 
    assertEquals(matches, 2);
}

Matching another range of numbers:

@Test
public void givenNumberRange_whenMatchesAccurately_
  thenCorrect2(){
    int matches = runTest(
      "3[0-5]", "Two Uppercase alphabets 34 overall");
  
    assertEquals(matches, 1);
}

6.4. Union Class

A union character class is the result of combining two or more character classes:

@Test
public void givenTwoSets_whenMatchesUnion_thenCorrect() {
    int matches = runTest("[1-3[7-9]]", "123456789");
 
    assertEquals(matches, 6);
}

The above test will only match six out of the nine integers because the union set skips 4, 5, and 6.

6.5. Intersection Class

Similar to the union class, this class results from picking common elements between two or more sets. To apply intersection, we use the &&:

@Test
public void givenTwoSets_whenMatchesIntersection_thenCorrect() {
    int matches = runTest("[1-6&&[3-9]]", "123456789");
 
    assertEquals(matches, 4);
}

We’ll get four matches because the intersection of the two sets has only four elements.

6.6. Subtraction Class

We can use subtraction to negate one or more character classes. For example, we can match a set of odd decimal numbers:

@Test
public void givenSetWithSubtraction_whenMatchesAccurately_thenCorrect() {
    int matches = runTest("[0-9&&[^2468]]", "123456789");
 
    assertEquals(matches, 5);
}

Only 1, 3, 5, 7, 9 will be matched.

7. Predefined Character Classes

The Java regex API also accepts predefined character classes. Some of the above character classes can be expressed in shorter form, although this makes the code less intuitive. One special aspect of the Java version of this regex is the escape character.

As we’ll see, most characters will start with a backslash, which has a special meaning in Java. For these to be compiled by the Pattern class, the leading backslash must be escaped, i.e. \d becomes \\d.

Matching digits, equivalent to [0-9]:

@Test
public void givenDigits_whenMatches_thenCorrect() {
    int matches = runTest("\\d", "123");
 
    assertEquals(matches, 3);
}

Matching non-digits, equivalent to [^0-9]:

@Test
public void givenNonDigits_whenMatches_thenCorrect() {
    int mathces = runTest("\\D", "a6c");
 
    assertEquals(matches, 2);
}

Matching white space:

@Test
public void givenWhiteSpace_whenMatches_thenCorrect() {
    int matches = runTest("\\s", "a c");
 
    assertEquals(matches, 1);
}

Matching non-white space:

@Test
public void givenNonWhiteSpace_whenMatches_thenCorrect() {
    int matches = runTest("\\S", "a c");
 
    assertEquals(matches, 2);
}

Matching a word character, equivalent to [a-zA-Z_0-9]:

@Test
public void givenWordCharacter_whenMatches_thenCorrect() {
    int matches = runTest("\\w", "hi!");
 
    assertEquals(matches, 2);
}

Matching a non-word character:

@Test
public void givenNonWordCharacter_whenMatches_thenCorrect() {
    int matches = runTest("\\W", "hi!");
 
    assertEquals(matches, 1);
}

8. Quantifiers

The Java regex API also allows us to use quantifiers. These enable us to further tweak the match’s behavior by specifying the number of occurrences to match against.

To match a text zero or one time, we use the ? quantifier:

@Test
public void givenZeroOrOneQuantifier_whenMatches_thenCorrect() {
    int matches = runTest("\\a?", "hi");
 
    assertEquals(matches, 3);
}

Alternatively, we can use the brace syntax, which is also supported by the Java regex API:

@Test
public void givenZeroOrOneQuantifier_whenMatches_thenCorrect2() {
    int matches = runTest("\\a{0,1}", "hi");
 
    assertEquals(matches, 3);
}

This example introduces the concept of zero-length matches. It so happens that if a quantifier’s threshold for matching is zero, it always matches everything in the text, including an empty String at the end of every input. This means that even if the input is empty, it’ll return one zero-length match.

This explains why we get three matches in the above example, despite having a String of length two. The third match is zero-length empty String.

To match a text zero or limitless times, we us the * quantifier, which is similar to ?:

@Test
public void givenZeroOrManyQuantifier_whenMatches_thenCorrect() {
     int matches = runTest("\\a*", "hi");
 
     assertEquals(matches, 3);
}

Supported alternative:

@Test
public void givenZeroOrManyQuantifier_whenMatches_thenCorrect2() {
    int matches = runTest("\\a{0,}", "hi");
 
    assertEquals(matches, 3);
}

The quantifier with a difference is +, which has a matching threshold of one. If the required String doesn’t occur at all, there will be no match, not even a zero-length String:

@Test
public void givenOneOrManyQuantifier_whenMatches_thenCorrect() {
    int matches = runTest("\\a+", "hi");
 
    assertFalse(matches);
}

Supported alternative:

@Test
public void givenOneOrManyQuantifier_whenMatches_thenCorrect2() {
    int matches = runTest("\\a{1,}", "hi");
 
    assertFalse(matches);
}

As in Perl and other languages, we can use the brace syntax to match a given text a number of times:

@Test
public void givenBraceQuantifier_whenMatches_thenCorrect() {
    int matches = runTest("a{3}", "aaaaaa");
 
    assertEquals(matches, 2);
}

In the above example, we get two matches, since a match occurs only if a appears three times in a row. However, in the next test, we won’t get a match because the text only appears two times in a row:

@Test
public void givenBraceQuantifier_whenFailsToMatch_thenCorrect() {
    int matches = runTest("a{3}", "aa");
 
    assertFalse(matches > 0);
}

When we use a range in the brace, the match will be greedy, matching from the higher end of the range:

@Test
public void givenBraceQuantifierWithRange_whenMatches_thenCorrect() {
    int matches = runTest("a{2,3}", "aaaa");
 
    assertEquals(matches, 1);
}

Here we specified at least two occurrences, but not exceeding three, so we get a single match where the matcher sees a single aaa and a lone a, which can’t be matched.

However, the API allows us to specify a lazy or reluctant approach such that the matcher can start from the lower end of the range, matching two occurrences as aa and aa:

@Test
public void givenBraceQuantifierWithRange_whenMatchesLazily_thenCorrect() {
    int matches = runTest("a{2,3}?", "aaaa");
 
    assertEquals(matches, 2);
}

9. Capturing Groups

The API also allows us to treat multiple characters as a single unit through capturing groups. It will attach numbers to the capturing groups, and allow back referencing using these numbers.

In this section, we’ll see a few examples of how to use capturing groups in the Java regex API.

Let’s use a capturing group that matches only when an input text contains two digits next to each other:

@Test
public void givenCapturingGroup_whenMatches_thenCorrect() {
    int matches = runTest("(\\d\\d)", "12");
 
    assertEquals(matches, 1);
}

The number attached to the above match is 1, using a back reference to tell the matcher that we want to match another occurrence of the matched portion of the text. This way, instead of having two separate matches for the input:

@Test
public void givenCapturingGroup_whenMatches_thenCorrect2() {
    int matches = runTest("(\\d\\d)", "1212");
 
    assertEquals(matches, 2);
}

We can have one match, but propagating the same regex match to span the entire length of the input using back referencing:

@Test
public void givenCapturingGroup_whenMatchesWithBackReference_
  thenCorrect() {
    int matches = runTest("(\\d\\d)\\1", "1212");
 
    assertEquals(matches, 1);
}

We would have to repeat the regex without back referencing to achieve the same result:

@Test
public void givenCapturingGroup_whenMatches_thenCorrect3() {
    int matches = runTest("(\\d\\d)(\\d\\d)", "1212");
 
    assertEquals(matches, 1);
}

Similarly, for any other number of repetitions, back referencing can make the matcher see the input as a single match:

@Test
public void givenCapturingGroup_whenMatchesWithBackReference_
  thenCorrect2() {
    int matches = runTest("(\\d\\d)\\1\\1\\1", "12121212");
 
    assertEquals(matches, 1);
}

But if we change even the last digit, the match will fail:

@Test
public void givenCapturingGroupAndWrongInput_
  whenMatchFailsWithBackReference_thenCorrect() {
    int matches = runTest("(\\d\\d)\\1", "1213");
 
    assertFalse(matches > 0);
}

It’s important not to forget the escape backslashes, which are crucial in Java syntax.

10. Boundary Matchers

The Java regex API also supports boundary matching. If we care about where exactly in the input text the match should occur, then this is what we’re looking for. With the previous examples, all we cared about was whether or not a match was found.

To match only when the required regex is true at the beginning of the text, we use the caret ^.

This test will pass, since the text dog can be found at the beginning:

@Test
public void givenText_whenMatchesAtBeginning_thenCorrect() {
    int matches = runTest("^dog", "dogs are friendly");
 
    assertTrue(matches > 0);
}

The following test will fail:

@Test
public void givenTextAndWrongInput_whenMatchFailsAtBeginning_
  thenCorrect() {
    int matches = runTest("^dog", "are dogs are friendly?");
 
    assertFalse(matches > 0);
}

To match only when the required regex is true at the end of the text, we use the dollar character $. We’ll find a match in the following case:

@Test
public void givenText_whenMatchesAtEnd_thenCorrect() {
    int matches = runTest("dog$", "Man's best friend is a dog");
 
    assertTrue(matches > 0);
}

And we won’t find a match here:

@Test
public void givenTextAndWrongInput_whenMatchFailsAtEnd_thenCorrect() {
    int matches = runTest("dog$", "is a dog man's best friend?");
 
    assertFalse(matches > 0);
}

If we want a match only when the required text is found at a word boundary, we use the \\b regex at the beginning and end of the regex:

Space is a word boundary:

@Test
public void givenText_whenMatchesAtWordBoundary_thenCorrect() {
    int matches = runTest("\\bdog\\b", "a dog is friendly");
 
    assertTrue(matches > 0);
}

The empty string at the beginning of a line is also a word boundary:

@Test
public void givenText_whenMatchesAtWordBoundary_thenCorrect2() {
    int matches = runTest("\\bdog\\b", "dog is man's best friend");
 
    assertTrue(matches > 0);
}

These tests pass because the beginning of a String, as well as the space between one text and another, marks a word boundary. However, the following test shows the opposite:

@Test
public void givenWrongText_whenMatchFailsAtWordBoundary_thenCorrect() {
    int matches = runTest("\\bdog\\b", "snoop dogg is a rapper");
 
    assertFalse(matches > 0);
}

Two-word characters appearing in a row doesn’t mark a word boundary, but we can make it pass by changing the end of the regex to look for a non-word boundary:

@Test
public void givenText_whenMatchesAtWordAndNonBoundary_thenCorrect() {
    int matches = runTest("\\bdog\\B", "snoop dogg is a rapper");
    assertTrue(matches > 0);
}

11. Pattern Class Methods

Previously, we only created Pattern objects in a basic way. However, this class has another variant of the compile method that accepts a set of flags alongside the regex argument, which affects the way we match the pattern.

These flags are simply abstracted integer values. Let’s overload the runTest method in the test class, so that it can take a flag as the third argument:

public static int runTest(String regex, String text, int flags) {
    pattern = Pattern.compile(regex, flags);
    matcher = pattern.matcher(text);
    int matches = 0;
    while (matcher.find()){
        matches++;
    }
    return matches;
}

In this section, we’ll look at the different supported flags and how to use them.

Pattern.CANON_EQ

This flag enables canonical equivalence. When specified, two characters will be considered to match if, and only if, their full canonical decompositions match.

Consider the accented Unicode character é. Its composite code point is u00E9. However, Unicode also has a separate code point for its component characters e, u0065, and the acute accent, u0301. In this case, composite character u00E9 is indistinguishable from the two character sequence u0065 u**0301.

By default, matching doesn’t take canonical equivalence into account:

@Test
public void givenRegexWithoutCanonEq_whenMatchFailsOnEquivalentUnicode_thenCorrect() {
    int matches = runTest("\u00E9", "\u0065\u0301");
 
    assertFalse(matches > 0);
}

But if we add the flag, then the test will pass:

@Test
public void givenRegexWithCanonEq_whenMatchesOnEquivalentUnicode_thenCorrect() {
    int matches = runTest("\u00E9", "\u0065\u0301", Pattern.CANON_EQ);
 
    assertTrue(matches > 0);
}

Pattern.CASE_INSENSITIVE

This flag enables matching regardless of case. By default, matching takes case into account:

@Test
public void givenRegexWithDefaultMatcher_whenMatchFailsOnDifferentCases_thenCorrect() {
    int matches = runTest("dog", "This is a Dog");
 
    assertFalse(matches > 0);
}

So using this flag, we can change the default behavior:

@Test
public void givenRegexWithCaseInsensitiveMatcher
  _whenMatchesOnDifferentCases_thenCorrect() {
    int matches = runTest(
      "dog", "This is a Dog", Pattern.CASE_INSENSITIVE);
 
    assertTrue(matches > 0);
}

We can also use the equivalent, embedded flag expression to achieve the same result:

@Test
public void givenRegexWithEmbeddedCaseInsensitiveMatcher
  _whenMatchesOnDifferentCases_thenCorrect() {
    int matches = runTest("(?i)dog", "This is a Dog");
 
    assertTrue(matches > 0);
}

Pattern.COMMENTS

The Java API allows us to include comments using # in the regex. This can help in documenting complex regex that may not be immediately obvious to another programmer.

The comments flag makes the matcher ignore any white space or comments in the regex, and only consider the pattern. In the default matching mode, the following test would fail:

@Test
public void givenRegexWithComments_whenMatchFailsWithoutFlag_thenCorrect() {
    int matches = runTest(
      "dog$  #check for word dog at end of text", "This is a dog");
 
    assertFalse(matches > 0);
}

This is because the matcher will look for the entire regex in the input text, including the spaces and the # character. But when we use the flag, it’ll ignore the extra spaces, and all text starting with # will be seen as a comment to be ignored for each line:

@Test
public void givenRegexWithComments_whenMatchesWithFlag_thenCorrect() {
    int matches = runTest(
      "dog$  #check end of text","This is a dog", Pattern.COMMENTS);
 
    assertTrue(matches > 0);
}

There’s also an alternative embedded flag expression for this:

@Test
public void givenRegexWithComments_whenMatchesWithEmbeddedFlag_thenCorrect() {
    int matches = runTest(
      "(?x)dog$  #check end of text", "This is a dog");
 
    assertTrue(matches > 0);
}

Pattern.DOTALL

By default, when we use the dot “.” expression in regex, we’re matching every character in the input String until we encounter a new line character.

Using this flag, the match will include the line terminator as well. We’ll understand this better with the following examples. These examples will be a little different. Since we want to assert against the matched String, we’ll use matcher‘s group method, which returns the previous match.

First, let’s see the default behavior:

@Test
public void givenRegexWithLineTerminator_whenMatchFails_thenCorrect() {
    Pattern pattern = Pattern.compile("(.*)");
    Matcher matcher = pattern.matcher(
      "this is a text" + System.getProperty("line.separator") 
        + " continued on another line");
    matcher.find();
 
    assertEquals("this is a text", matcher.group(1));
}

As we can see, only the first part of the input before the line terminator is matched.

Now in dotall mode, the entire text, including the line terminator, will be matched:

@Test
public void givenRegexWithLineTerminator_whenMatchesWithDotall_thenCorrect() {
    Pattern pattern = Pattern.compile("(.*)", Pattern.DOTALL);
    Matcher matcher = pattern.matcher(
      "this is a text" + System.getProperty("line.separator") 
        + " continued on another line");
    matcher.find();
    assertEquals(
      "this is a text" + System.getProperty("line.separator") 
        + " continued on another line", matcher.group(1));
}

We can also use an embedded flag expression to enable dotall mode:

@Test
public void givenRegexWithLineTerminator_whenMatchesWithEmbeddedDotall
  _thenCorrect() {
    
    Pattern pattern = Pattern.compile("(?s)(.*)");
    Matcher matcher = pattern.matcher(
      "this is a text" + System.getProperty("line.separator") 
        + " continued on another line");
    matcher.find();
 
    assertEquals(
      "this is a text" + System.getProperty("line.separator") 
        + " continued on another line", matcher.group(1));
}

Pattern.LITERAL

When in this mode, the matcher gives no special meaning to any meta characters, escape characters, or regex syntax. Without this flag, the matcher will match the following regex against any input String:

@Test
public void givenRegex_whenMatchesWithoutLiteralFlag_thenCorrect() {
    int matches = runTest("(.*)", "text");
 
    assertTrue(matches > 0);
}

This is the default behavior we’ve seen in all the examples. However, with this flag, we won’t find a match, since the matcher will be looking for (.*) instead of interpreting it:

@Test
public void givenRegex_whenMatchFailsWithLiteralFlag_thenCorrect() {
    int matches = runTest("(.*)", "text", Pattern.LITERAL);
 
    assertFalse(matches > 0);
}

Now if we add the required string, the test will pass:

@Test
public void givenRegex_whenMatchesWithLiteralFlag_thenCorrect() {
    int matches = runTest("(.*)", "text(.*)", Pattern.LITERAL);
 
    assertTrue(matches > 0);
}

There’s no embedded flag character for enabling literal parsing.

Pattern.MULTILINE

By default, the ^ and $ meta characters match absolutely at the beginning and end, respectively, of the entire input String. The matcher disregards any line terminators:

@Test
public void givenRegex_whenMatchFailsWithoutMultilineFlag_thenCorrect() {
    int matches = runTest(
      "dog$", "This is a dog" + System.getProperty("line.separator") 
      + "this is a fox");
 
    assertFalse(matches > 0);
}

This match will fail because the matcher searches for dog at the end of the entire String, but the dog is present at the end of the first line of the string.

However, with the flag, the same test will pass, since the matcher now takes into account line terminators. So the String dog is found just before the line terminates, meaning success:

@Test
public void givenRegex_whenMatchesWithMultilineFlag_thenCorrect() {
    int matches = runTest(
      "dog$", "This is a dog" + System.getProperty("line.separator") 
      + "this is a fox", Pattern.MULTILINE);
 
    assertTrue(matches > 0);
}

Here’s the embedded flag version:

@Test
public void givenRegex_whenMatchesWithEmbeddedMultilineFlag_
  thenCorrect() {
    int matches = runTest(
      "(?m)dog$", "This is a dog" + System.getProperty("line.separator") 
      + "this is a fox");
 
    assertTrue(matches > 0);
}

12. Matcher Class Methods

In this section, we’ll learn about the useful methods of the Matcher class. We’ll group them according to functionality for clarity.

12.1. Index Methods

Index methods provide useful index values that show us precisely where to find the match in the input String. In the following test, we’ll confirm the start and end indices of the match for dog in the input String:

@Test
public void givenMatch_whenGetsIndices_thenCorrect() {
    Pattern pattern = Pattern.compile("dog");
    Matcher matcher = pattern.matcher("This dog is mine");
    matcher.find();
 
    assertEquals(5, matcher.start());
    assertEquals(8, matcher.end());
}

12.2. Study Methods

Study methods go through the input String and return a boolean indicating whether or not the pattern was found. Commonly used are the matches and lookingAt methods.

The matches and lookingAt methods both attempt to match an input sequence against a pattern. The difference is that matches requires the entire input sequence to be matched, while lookingAt doesn’t.

Both methods start at the beginning of the input String :

@Test
public void whenStudyMethodsWork_thenCorrect() {
    Pattern pattern = Pattern.compile("dog");
    Matcher matcher = pattern.matcher("dogs are friendly");
 
    assertTrue(matcher.lookingAt());
    assertFalse(matcher.matches());
}

The matches method will return true in a case like this:

@Test
public void whenMatchesStudyMethodWorks_thenCorrect() {
    Pattern pattern = Pattern.compile("dog");
    Matcher matcher = pattern.matcher("dog");
 
    assertTrue(matcher.matches());
}

12.3. Replacement Methods

Replacement methods are useful to replace text in an input string. The common ones are replaceFirst and replaceAll.

The replaceFirst and replaceAll methods replace the text that matches a given regular expression. As their names indicates, replaceFirst replaces the first occurrence, and replaceAll replaces all occurrences:

@Test
public void whenReplaceFirstWorks_thenCorrect() {
    Pattern pattern = Pattern.compile("dog");
    Matcher matcher = pattern.matcher(
      "dogs are domestic animals, dogs are friendly");
    String newStr = matcher.replaceFirst("cat");
 
    assertEquals(
      "cats are domestic animals, dogs are friendly", newStr);
}

Replace all occurrences:

@Test
public void whenReplaceAllWorks_thenCorrect() {
    Pattern pattern = Pattern.compile("dog");
    Matcher matcher = pattern.matcher(
      "dogs are domestic animals, dogs are friendly");
    String newStr = matcher.replaceAll("cat");
 
    assertEquals("cats are domestic animals, cats are friendly", newStr);
}

The replaceAll method allows us to substitute all matches with the same replacement. If we want to replace matches on a case by basis, we’d need a token replacement technique.

13. Conclusion

In this article, we learned how to use regular expressions in Java. We also explored the most important features of the java.util.regex package.

The full source code for the project, including all the code samples used here, can be found in the GitHub project.