1. Overview

When working with file systems in Java, validating folder paths is crucial to ensure that our applications function correctly and securely. One efficient way to perform path validation is by using regular expressions (regex).

In this tutorial, we’ll explore how to validate Linux folder paths using regex in Java, ensuring that the paths we use conform to expected patterns and conventions.

2. Introduction to the Problem

When implementing Linux directory paths in an application, we often need to follow particular requirements rather than accepting all valid paths of a specific Linux filesystem, such as ext4.

As an example, let’s say the Linux directory String in our application must pass the following checks:

  • The directory path must not be empty.
  • The path must be absolute. In other words, it must begin with a slash character (/); relative paths like ./foo and ../foo are not allowed.
  • Except for slashes, the absolute path can only contain dashes (-), underscores (_), digits, and lowercase and uppercase letters.
  • The directory path must not end with a slash character. For example, we consider “/foo/bar/” an invalid path. But there is one and only one exception: the root directory “/” is allowed.

It’s worth noting that our validation doesn’t aim to check whether a given directory path exists in the current filesystem. If a file- or directory-existence check is required, regex might not be the right tool for this task.

Next, let’s see how to build a regex pattern to fulfill the validation rules.

3. Creating the Regex Pattern

At first glance, creating a regex pattern that fulfills all requirements can seem complicated. So next, let’s make the regex pattern together step by step, and we’ll see that it’s not a challenging task.

First, since a valid path always begins with the slash character, and only dash (-), underscores (_), digits, and lowercase and uppercase letters are allowed, we can create this regex pattern to start with: “*^/[0-9a-zA-Z_-]+$“. The character class [0-9a-zA-Z_] matches and word characters. In regex, \w is the shorthand character class for the word character class. Therefore, we can replace “0-9a-zA-Z_“ with “\w” to make the pattern simpler and easy to read: “^/[\\w-]+$*“.

The current pattern only matches the top-level directories, such as “*/foo” and “/123“. However, a directory may contain multi-level subdirectories, for instance, “/foo/sub1/sub2/sub3*“.

If we examine this path carefully, we find a valid path with subdirectories is made up of multiple directory Strings*. For example, “/foo/sub1/sub2/sub3” includes four segments matching our top-level directory pattern: “/foo“, “/sub1“, “/sub2“, and “/sub3*“.

Therefore, *to match multiple continuous directories, we can put our top-level directory pattern in a capturing group and add the ‘+’ quantifier to the group: “^(/[\\w-]+)+$*.

This pattern will match nearly all directory paths. We’re almost there. However, there’s one special case this pattern doesn’t cover: the root directory “/”. The pattern that matches “/” is “*^/$“. We can merge the two patterns using the “OR” operator (|) to match both cases. So, we have: “^/|(/[\\w-]+)+$*“.

Next, let’s test if this pattern works as expected.

The AssertJ library allows us to write fluent assertion statements in tests. Further, it offers many handy methods for easily verifying test results. For example, we can employ its matches() and doesNotMatch() methods to verify regex pattern match in tests:

String regex = "^/|(/[\\w-]+)+$";
assertThat("/").matches(regex);
assertThat("/foo").matches(regex);
assertThat("/foo/0").matches(regex);
assertThat("/foo/0/bar").matches(regex);
assertThat("/f_o_o/-/bar").matches(regex);
 
assertThat("").doesNotMatch(regex);
assertThat("  ").doesNotMatch(regex);
assertThat("foo").doesNotMatch(regex);
assertThat("/foo/").doesNotMatch(regex);
assertThat("/foo/bar/").doesNotMatch(regex);
assertThat("/fo o/bar").doesNotMatch(regex);
assertThat("/foo/b@ar").doesNotMatch(regex);

As the test above shows, our regex pattern passed both positive and negative tests. Therefore, the validator using this regex pattern fulfills the requirements.

4. Conclusion

Validating Linux folder paths using regex in Java is a powerful technique for ensuring they conform to expected patterns.

Using the techniques addressed in this article, we can confidently handle Linux paths in our Java projects, leading to more resilient and maintainable code.

As always, the complete source code for the examples is available over on GitHub.