1. Introduction

In this tutorial, we’ll show how to use the Spoon library to parse, analyze and transform Java source code.

2. Spoon Overview

When dealing with large codebases, it’s not uncommon that we need to digest them for a given purpose. Examples include:

  • Generating aggregate reports
  • Finding usages of a given class, including indirect usage through complex inheritance chains
  • Spot potential vulnerabilities
  • Automated refactoring

This list can go on and on, but there’s a common pattern related to all those tasks. Firstly, they require us to scan the existing code and build an internal representation for it. Secondly, we’d use a visitor pattern or a query mechanism to find elements we’re interested in. Finally, we’d generate the desired output.

The Spoon library focuses on the two first steps, so we can focus on producing the required results.

Sure, a simple text-based shell or Python pipeline can get the job done for some use cases. However, this approach lacks a deeper understanding of the scanned code and, as such, limits the kind of analysis we can do.

On the other hand, Spoon creates a full in-memory model of the codebase, allowing it to traverse it in many ways. Under the hood, Spoon uses Eclipse’s JDT compiler to parse source code, resulting in a “high-fidelity” model that includes not only classes, methods, and so on but also all statements and comments.

Also, Spoon can handle syntactically invalid code and doesn’t care about missing dependencies, which is nice if you must dig into hundreds of git repositories worth of legacy code.

3. Using Spoon

3.1. Maven Dependency

To use the Spoon library in our projects, we need to add it as a dependency:

<dependency>
    <groupId>fr.inria.gforge.spoon</groupId>
     <artifactId>spoon-core</artifactId>
    <version>10.3.0</version>
</dependency>

The latest version is available on Maven Central.

Note that, starting with version 10, Spoon requires Java 11 or later to run. Regardless, it can parse and create models from Java source code up to version 16 (as of this writing).

3.2. Parsing Code

Let’s start with a simple example. We’ll use Spoon to parse a single Java class and create a report with the count of public, private, and protected methods.

The SpoonAPI interface acts as the main entry point to use the library. The standard way to get a concrete implementation of this interface is to create a new Launcher instance:

SpoonAPI spoon = new Launcher();

Next, we’ll inform the location of the source code we want to analyze using addInputResource():

spoon.addInputResource("some/directory/SomeClass.java");

This method accepts the path for a single class or directory. In the latter case, all Java files will be recursively parsed. This method can be called multiple times. For instance, this would be the case if we want to parse code from multiple repositories at once.

Now, we’ll use buildModel() to create the CtModel instance that holds information about all processed code:

CtModel model = spoon.buildModel();

One way to think about the CtModel class is that it plays a similar role to the Document class in XML processing: it is the root of a tree from which every other element can be reached. In our case, an element can be a class, method, package variable declaration, and even a statement.

CtModel has methods that allow us to find elements of a given type and traverse it using a visitor pattern-style callback. In our case, we’ll use both approaches to get the method counts:

MethodSummary report = new MethodSummary();                        
model.filterChildren((el) -> el instanceof CtClass<?>)
  .forEach((CtClass<?> clazz) -> processMethods(report, clazz));

Firstly, we use filterChildren() to return a CtQuery instance that matches only CtClass elements in the model. Next, we process each matching entry using forEach(). The argument is a lambda function that calls processMethods() to evaluate the classes’ methods using a similar pattern:

private void processMethods(MethodSummary report, CtClass<?> ctClass) {                
    ctClass.filterChildren((c) -> c instanceof CtMethod<?> )
      .forEach((CtMethod<?> m) -> {  
          if (m.isPublic()) {
              report.addPublicMethod();
          }
          else if ( m.isPrivate()) {
              report.addPrivateMethod();
          }
          else if ( m.isProtected()) {
              report.addProtectedMethod();
          }
          else {
              report.addPackagePrivateMethod();
          }                         
      });      
}

Here, the root element is the class under analysis, and we’ll iterate over each CtMethod, updating the report counters according to its visibility.

To test this code, we’ll pass it a simple class (available online) and verify that we get the correct counts for each method visibility:

@Test
public void whenGenerateReport_thenSuccess() {  
    ClassReporter reporter = new ClassReporter();
    MethodSummary report = reporter.generateMethodSummaryReport("src/test/resources/spoon/SpoonClassToTest.java");
    assertThat(report).isNotNull();
    assertThat(report.getPackagePrivateMethodCount()).isEqualTo(1);
    assertThat(report.getPublicMethodCount()).isEqualTo(1);
    assertThat(report.getPrivateMethodCount()).isEqualTo(1);
}

This code also works if the parsed class has syntax errors. For instance, given this syntactically invalid class:

public class BrokenClass {   
    // Syntax error
    pluvic void brokenMethod() {}

    // Syntax error
    protected void protectedMethod() thraws Exception {}

    // Valid method    
    public void publicMethod() {}
}

We still get the correct answer for public, protected, and private methods. As for the broken methods, the internal representation tries to get as much information as possible. If we put a breakpoint in processMethods(), we’ll be able to see that forEach() will eventually receive a CtMethod with information about the invalid method.

3.3. Transforming Code

The CtModel instance that we get from buildModel() straightforwardly supports transformations. All we have to do is to use the mutator methods available in any CtElement-derived object. For instance, we can rename a method, represented by a CtMethod, simply by using setSimpleName():

CtMethod method = ... 
method.setSimpleName("newname");

Let’s write a simple example that adds a standard Javadoc comment with a copyright notice in every class:

CtModel model = // ... model creation logic omitted

model.filterChildren((el) -> el instanceof CtClass<?>)
  .forEach((CtClass<?> cl) -> {
      CtComment comment = cl.getFactory()
        .createComment("Copyright(c) 2023 etc", CommentType.JAVADOC);
      cl.addComment(comment);
  });

The model modification happens inside the lambda passed to forEach. We use getFactory() from the current element and use it to create a new CtComment, which stands for a “detached” element. We then add this comment to the class using addComment().

The pattern is the same to change other code aspects. We could add any language by first creating the corresponding CtElement and then using one of the available mutators to insert it in the proper place.

Once we’re done with the transformations, we use setOutputDirectory() and prettyprint() to write the model back to the filesystem:

spoon.setSourceOutputDirectory("./target");
spoon.prettyprint();

The generated code will now contain a comment block just before the class declaration:

// ... package and import declarations omitted
/**
 * Copyright(c) 2023 etc
 */
public class SpoonClassToTest {
    // ... class code omitted
}

3.4. Using Processors

In the earlier examples, code inspection and modification happened in an ad-hoc way: we get a model instance and start fiddling with it. Spoon supports a more structured way to traverse the code using a Processor.

The main advantage of this approach is that it is easily composable and allows the main processing sequence to become isolated from the analysis/transformation code. Let’s show this approach in practice by rewriting the copyright example as a Processor:

public class AddCopyrightProcessor extends AbstractProcessor<CtClass<?>> {
    @Override
    public void process(CtClass<?> clazz) {
        CtComment comment = getFactory().createComment("Copyright(c) 2023 etc", CommentType.JAVADOC);
        clazz.addComment(comment);                    
    }    
}

The Processor interface has several methods, but Spoon provides a convenience base class that we can extend: AbstractProcessor. This class implements everything Spoon needs but one method that we still must implement: process(). Spoon will call this method during the model processing phase for every matching element in the model.

Now, we must inform Spoon about our processor using the addProcessor() method available in SpoonAPI:

spoon.addProcessor(new AddCopyrightProcessor());

Finally, we can run the Spoon as before. This time, however, the top-level code doesn’t have to explicitly call the processing code:

spoon.addInputResource("src/test/resources/spoon/SpoonClassToTest.java");
spoon.setSourceOutputDirectory("./target/spoon-processed");
spoon.buildModel();
spoon.process();
spoon.prettyprint();

This code, in fact, is almost identical to the one Spoon uses when used from the command line.

3.5. Tuning Spoon’s Environment

Spoon has a number of processing options that we can tune to fit our needs. Out-of-the-box, those options assume reasonable defaults, so usually, we can leave them untouched. This is a brief list of those options:

  • Enable/Disable strict syntax checking
  • Java compliance level
  • Source file encoding
  • Log settings
  • Source code output location
  • Java output writer implementation

To change any of those options, we first use getEnvironment() to access the Spoon’s Environment and then use it to modify the option we want to customize. For instance, this is how we would use tabs instead of spaces in generated files:

spoon.getEnvironment().useTabulations(true);

Another interesting use case is replacing the default Java code generator. Spoon comes with an alternative one that tries to preserve as much as possible the original code when producing its output, called SniperJavaPrettyPrinter.

The main advantage of this generator is that it produces code that, when compared with the original, will differ only where the processors made changes. To replace the default generator, we use setPrettyPrintGenerator(), which takes a Supplier for the PrettyPrinter that Spoon will use:

spoon.getEnvironment().setPrettyPrinterCreator(() -> new SniperJavaPrettyPrinter(spoon.getEnvironment()));

4. Conclusion

In this article, we’ve shown how to use the Spoon library to analyze and modify Java source code.

As usual, the complete code is available over on GitHub.