1. Introduction

Regular expressions are a powerful tool for matching various kinds of patterns when used appropriately.

In this article, we’ll use java.util.regex package to determine whether a given String contains a valid date or not.

For an introduction to regular expressions, refer to our Guide To Java Regular Expressions API.

2. Date Format Overview

We’re going to define a valid date in relation to the international Gregorian calendar. Our format will follow the general pattern: YYYY-MM-DD.

Let’s also include the concept of a leap year that is a year containing a day of February 29th. According to the Gregorian calendar, we’ll call a year leap if the year number can be divided evenly by 4 except for those which are divisible by 100 but including those which are divisible by 400.

In all other cases***,*** we’ll call a year regular.

Examples of valid dates:

  • 2017-12-31
  • 2020-02-29
  • 2400-02-29

Examples of invalid dates:

  • 2017/12/31: incorrect token delimiter
  • 2018-1-1: missing leading zeroes
  • 2018-04-31: wrong days count for April
  • 2100-02-29: this year isn’t leap as the value divides by 100, so February is limited to 28 days

3. Implementing a Solution

Since we’re going to match a date using regular expressions, let’s first sketch out an interface DateMatcher, which provides a single matches method:

public interface DateMatcher {
    boolean matches(String date);
}

We’re going to present the implementation step-by-step below, building towards to complete solution at the end.

3.1. Matching the Broad Format

We’ll start by creating a very simple prototype handling the format constraints of our matcher:

class FormattedDateMatcher implements DateMatcher {

    private static Pattern DATE_PATTERN = Pattern.compile(
      "^\\d{4}-\\d{2}-\\d{2}$");

    @Override
    public boolean matches(String date) {
        return DATE_PATTERN.matcher(date).matches();
    }
}

Here we’re specifying that a valid date must consist of three groups of integers separated by a dash. The first group is made up of four integers, with the remaining two groups having two integers each.

Matching dates: 2017-12-31, 2018-01-31, 0000-00-00, 1029-99-72

Non-matching dates: 2018-01, 2018-01-XX, 2020/02/29

3.2. Matching the Specific Date Format

Our second example accepts ranges of date tokens as well as our formatting constraint. For simplicity, we have restricted our interest to the years 1900 – 2999.

Now that we successfully matched our general date format, we need to constrain that further – to make sure the dates are actually correct:

^((19|2[0-9])[0-9]{2})-(0[1-9]|1[012])-(0[1-9]|[12][0-9]|3[01])$

Here we’ve introduced three groups of integer ranges that need to match:

  • (19|2[0-9])[0-9]{2} covers a restricted range of years by matching a number which starts with 19 or 2X followed by a couple of any digits.
  • 0[1-9]|1[012] matches a month number in a range of 01-12
  • 0[1-9]|[12][0-9]|3[01] matches a day number in a range of 01-31

Matching dates: 1900-01-01, 2205-02-31, 2999-12-31

Non-matching dates: 1899-12-31, 2018-05-35, 2018-13-05, 3000-01-01, 2018-01-XX

3.3. Matching the February 29th

In order to match leap years correctly we must first identify when we have encountered a leap year, and then make sure that we accept February 29th as a valid date for those years.

As the number of leap years in our restricted range is large enough we should use the appropriate divisibility rules to filter them:

  • If the number formed by the last two digits in a number is divisible by 4, the original number is divisible by 4
  • If the last two digits of the number are 00, the number is divisible by 100

Here is a solution:

^((2000|2400|2800|(19|2[0-9])(0[48]|[2468][048]|[13579][26]))-02-29)$

The pattern consists of the following parts:

  • 2000|2400|2800 matches a set of leap years with a divider of 400 in a restricted range of 1900-2999
  • 19|2[0-9](0[48]|[2468][048]|[13579][26])) matches all white-list combinations of years which have a divider of 4 and don’t have a divider of 100
  • -02-29 matches February 2nd

Matching dates: 2020-02-29, 2024-02-29, 2400-02-29

Non-matching dates: 2019-02-29, 2100-02-29, 3200-02-29, 2020/02/29

3.4. Matching General Days of February

As well as matching February 29th in leap years, we also need to match all other days of February (1 – 28) in all years:

^(((19|2[0-9])[0-9]{2})-02-(0[1-9]|1[0-9]|2[0-8]))$

Matching dates: 2018-02-01, 2019-02-13, 2020-02-25

Non-matching dates: 2000-02-30, 2400-02-62, 2018/02/28

3.5. Matching 31-Day Months

The months January, March, May, July, August, October, and December should match for between 1 and 31 days:

^(((19|2[0-9])[0-9]{2})-(0[13578]|10|12)-(0[1-9]|[12][0-9]|3[01]))$

Matching dates: 2018-01-31, 2021-07-31, 2022-08-31

Non-matching dates: 2018-01-32, 2019-03-64, 2018/01/31

3.6. Matching 30-Day Months

The months April, June, September, and November should match for between 1 and 30 days:

^(((19|2[0-9])[0-9]{2})-(0[469]|11)-(0[1-9]|[12][0-9]|30))$

Matching dates: 2018-04-30, 2019-06-30, 2020-09-30

Non-matching dates: 2018-04-31, 2019-06-31, 2018/04/30

3.7. Gregorian Date Matcher

Now we can combine all of the patterns above into a single matcher to have a complete GregorianDateMatcher satisfying all of the constraints:

class GregorianDateMatcher implements DateMatcher {

    private static Pattern DATE_PATTERN = Pattern.compile(
      "^((2000|2400|2800|(19|2[0-9])(0[48]|[2468][048]|[13579][26]))-02-29)$" 
      + "|^(((19|2[0-9])[0-9]{2})-02-(0[1-9]|1[0-9]|2[0-8]))$"
      + "|^(((19|2[0-9])[0-9]{2})-(0[13578]|10|12)-(0[1-9]|[12][0-9]|3[01]))$" 
      + "|^(((19|2[0-9])[0-9]{2})-(0[469]|11)-(0[1-9]|[12][0-9]|30))$");

    @Override
    public boolean matches(String date) {
        return DATE_PATTERN.matcher(date).matches();
    }
}

We’ve used an alternation character “|” to match at least one of the four branches. Thus, the valid date of February either matches the first branch of February 29th of a leap year either the second branch of any day from 1 to 28. The dates of remaining months match third and fourth branches.

Since we haven’t optimized this pattern in favor of a better readability, feel free to experiment with a length of it.

At this moment we have satisfied all the constraints, we introduced in the beginning.

3.8. Note on Performance

Parsing complex regular expressions may significantly affect the performance of the execution flow. The primary purpose of this article was not to learn an efficient way of testing a string for its membership in a set of all possible dates.

Consider using LocalDate.parse() provided by Java8 if a reliable and fast approach to validating a date is needed.

4. Conclusion

In this article, we’ve learned how to use regular expressions for matching the strictly formatted date of the Gregorian calendar by providing rules of the format, the range and the length of months as well.

All the code presented in this article is available over on Github. This is a Maven-based project, so it should be easy to import and run as it is.