1. Overview

In this tutorial, we’ll do a quick overview of the ANTLR parser generator and show some real-world applications.

2. ANTLR

ANTLR (ANother Tool for Language Recognition) is a tool for processing structured text.

It does this by giving us access to language processing primitives like lexers, grammars, and parsers as well as the runtime to process text against them.

It’s often used to build tools and frameworks. For example, Hibernate uses ANTLR for parsing and processing HQL queries and Elasticsearch uses it for Painless.

And Java is just one binding. ANTLR also offers bindings for C#, Python, JavaScript, Go, C++ and Swift.

3. Configuration

First of all, let’s start by adding antlr-runtime to our pom.xml:

<dependency>
    <groupId>org.antlr</groupId>
    <artifactId>antlr4-runtime</artifactId>
    <version>4.7.1</version>
</dependency>

And also the antlr4-maven-plugin:

<plugin>
    <groupId>org.antlr</groupId>
    <artifactId>antlr4-maven-plugin</artifactId>
    <version>4.7.1</version>
    <executions>
        <execution>
            <goals>
                <goal>antlr4</goal>
            </goals>
        </execution>
    </executions>
</plugin>

It’s the plugin’s job to generate code from the grammars we specify.

4. How Does It Work?

Basically, when we want to create the parser by using the ANTLR Maven plugin, we need to follow three simple steps:

  • prepare a grammar file
  • generate sources
  • create the listener

So, let’s see these steps in action.

5.2. Generate Sources

ANTLR works by generating Java code corresponding to the grammar files that we give it, and the maven plugin makes it easy:

mvn package

By default, this will generate several files under the target/generated-sources/antlr4 directory:

  • Java8.interp
  • Java8Listener.java
  • Java8BaseListener.java
  • Java8Lexer.java
  • Java8Lexer.interp
  • Java8Parser.java
  • Java8.tokens
  • Java8Lexer.tokens

Notice that the names of those files are based on the name of the grammar file.

We’ll need the Java8Lexer and the Java8Parser files later when we test. For now, though, we need the Java8BaseListener for creating our MethodUppercaseListener.

5.3. Creating MethodUppercaseListener

Based on the Java8 grammar that we used, Java8BaseListener has several methods that we can override, each one corresponding to a heading in the grammar file.

For example, the grammar defines the method name, parameter list, and throws clause like so:

methodDeclarator
    :    Identifier '(' formalParameterList? ')' dims?
    ;

And so Java8BaseListener has a method enterMethodDeclarator which will be invoked each time this pattern is encountered.

So, let’s override enterMethodDeclarator, pull out the Identifier, and perform our check:

public class UppercaseMethodListener extends Java8BaseListener {

    private List<String> errors = new ArrayList<>();

    // ... getter for errors
 
    @Override
    public void enterMethodDeclarator(Java8Parser.MethodDeclaratorContext ctx) {
        TerminalNode node = ctx.Identifier();
        String methodName = node.getText();

        if (Character.isUpperCase(methodName.charAt(0))) {
            String error = String.format("Method %s is uppercased!", methodName);
            errors.add(error);
        }
    }
}

5.4. Testing

Now, let’s do some testing. First, we construct the lexer:

String javaClassContent = "public class SampleClass { void DoSomething(){} }";
Java8Lexer java8Lexer = new Java8Lexer(CharStreams.fromString(javaClassContent));

Then, we instantiate the parser:

CommonTokenStream tokens = new CommonTokenStream(lexer);
Java8Parser parser = new Java8Parser(tokens);
ParseTree tree = parser.compilationUnit();

And then, the walker and the listener:

ParseTreeWalker walker = new ParseTreeWalker();
UppercaseMethodListener listener= new UppercaseMethodListener();

Lastly, we tell ANTLR to walk through our sample class*:*

walker.walk(listener, tree);

assertThat(listener.getErrors().size(), is(1));
assertThat(listener.getErrors().get(0),
  is("Method DoSomething is uppercased!"));

6. Building Our Grammar

log : entry+;
entry : timestamp ' ' level ' ' message CRLF;

And then we’ll add the details for timestamp:

timestamp : DATE ' ' TIME;
public class LogListener extends LogBaseListener {

    private List<LogEntry> entries = new ArrayList<>();
    private LogEntry current;

7. Conclusion

In this article, we focused on how to create the custom parser for the own language using the ANTLR.

We also saw how to use existing grammar files and apply them for very simple tasks like code linting.

As always, all the code used here can be found over on GitHub.