1. Overview

In this tutorial, we’ll learn how to set character encoding in Maven.

We’ll showcase how to set encoding for some common Maven plugins.

Also, we’ll see how to set the encoding at a project level, as well as through the command line.

2. What Is Encoding and Why Should We Care?

There are lots of different languages in the world that use different characters.

One system of mapping characters, called Unicode, has well over 100,000 characters, symbols, and even emoticons (emoji).

So that we don’t use vast amounts of memory, we use a mapping system, called an encoding, to convert a character between bits and bytes, and a human-readable character on a screen.

There are now lots of encoding systems. To read a file, we must know which encoding system is used.

2.1. What Happens if We Don’t Declare Encoding in Maven?

Maven considers encoding important enough that if we don’t declare an encoding, then it will log out a warning.

In fact, this warning occupies the number one spot of the FAQ page on the Apache Maven site.

To see this warning, let’s add a couple of plugins to our build.

Firstly, let’s add maven-resources-plugin, which will copy resources into an output directory:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-resources-plugin</artifactId>
    <version>3.2.0</version>
</plugin>

We’ll also want to compile our code files, so let’s add maven-compiler-plugin:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-compiler-plugin</artifactId>
</plugin>

As we’re working inside a multi-module project, then a parent POM may have already set encoding for us. For demo purposes, let’s clear the encoding property by overriding it (don’t worry, we’ll come back to this later):

<properties>
    <project.build.sourceEncoding></project.build.sourceEncoding>
</properties>

Let’s run the plugin using the standard Maven command:

mvn clean install

Un-setting our encoding like this can break the build! We’ll see in our logging that we get the following warning:

[INFO] --- maven-resources-plugin:3.2.0:resources (default-resources) @ maven-properties ---
  [WARNING] Using platform encoding (Cp1252 actually) to copy filtered resources, i.e. build is platform dependent!

The warning states that if no encoding system is specified, Maven will use the platform default. 

Normally on Windows, the default is Windows-1252 (aka CP-1252, or Cp1252).

This default could change based on the local environment. We’ll see below how we can remove this platform dependency from our build.

2.2. What Happens if We Declare an Incorrect Encoding in Maven?

A maven is a build tool that needs to be able to read source files.

In order to read source files, Maven must be set to use the same encoding that the source files are encoded in.

Maven also produces files that are typically distributed to another computer. Therefore, it is important to write output files using an expected encoding. Output files that are not in the expected encoding could fail to be read on a different system.

To show this, let’s add a simple Java class that uses non-ASCII characters:

public class NonAsciiString {

    public static String getNonAsciiString() {

        String nonAsciiŞŧř = "ÜÝÞßàæç";
        return nonAsciiŞŧř;
    }
}

In our POM, let’s set our build to use ASCII encoding:

<properties>
    <project.build.sourceEncoding>US-ASCII</project.build.sourceEncoding>
</properties>

Running this using mvn clean install, we see that we get many build errors of the form:

[ERROR] /Baeldung/tutorials/maven-modules/maven-properties/src/main/java/
com/baeldung/maven/properties/NonAsciiString.java:[15,31] unmappable character (0xC3) for encoding US-ASCII

We’re seeing this because our files contain non-ASCII characters, so they can’t be read through ASCII encoding.

Where possible, it’s a good idea to keep things simple and avoid using non-ASCII characters.

In the next section, we’ll see it’s also a good idea to set Maven to use UTF-8 encoding to avoid any issues.

3. How Do We Set Encoding in Maven Configuration?

Firstly, let’s look at how we set the encoding at a plugin level.

We’ll then see that we can set project-wide properties. This means that we don’t need to declare an encoding in every plugin.

3.1. How Do We Set the encoding Parameter in a Maven Plugin?

Most plugins come with an encoding parameter, which makes this very simple.

We’ll need to set the encoding in the maven-resources-plugin and maven-compiler-plugin. We can simply add the encoding parameter to each of our Maven plugins:

<configuration>
    <encoding>UTF-8</encoding>
</configuration>

Let’s run this code using mvn clean install and take a look at the logging:

[INFO] --- maven-resources-plugin:3.2.0:resources (default-resources) @ maven-properties ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.

We can see that the plugin is now using UTF-8, and we’ve solved the warnings above.

3.2. How Do We Set a Project-Wide encoding Parameter in a Maven Build?

Remembering to set an encoding for each plugin that we declare is very cumbersome.

Thankfully, most Maven plugins use the same global Maven property as a default for their encoding parameter.

As we saw earlier, let’s remove the encoding parameters from our plugins and instead set:

<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>

Running our build produces the same UTF-8 line of logging that we saw above.

In a multi-module project, we would typically look to set this property in the parent POM.

This property will be overridden by any plugin-specific properties that are set.

It’s important to remember that plugins are not obliged to use this property. For example, earlier versions (<2.2) of the maven-war-plugin would ignore this property.

3.3. How Do We Set a Project-Wide encoding Parameter for a Reporting Plugin?

Perhaps surprisingly, we must set two properties in order to guarantee that we’ve set project-wide encoding for all cases.

To illustrate this, we’ll use properties-maven-plugin:

<plugin>
    <groupId>org.codehaus.mojo</groupId>
    <artifactId>properties-maven-plugin</artifactId>
    <version>1.1.0</version>
</plugin>

Let’s also set a new system-wide property to be empty:

<project.reporting.outputEncoding></project.reporting.outputEncoding>

If we run a mvn clean install now, our build will fail with the logging:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-pmd-plugin:3.13.0:pmd (pmd) on project maven-properties: Execution pmd of goal 
  org.apache.maven.plugins:maven-pmd-plugin:3.13.0:pmd failed: org.apache.maven.reporting.MavenReportException: : UnsupportedEncodingException -> [Help 1]

Even though we’ve set project.build.sourceEncoding, this plugin is also using a different property. To understand why this is, we must understand the difference between Maven Build Configuration and Maven Report Configuration.

Plugins can be used in either the Build process or the Reporting process, which uses separate property keys.

This means that just setting project.build.sourceEncoding is not enough. We also need to add the following property for the Reporting process:

<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>

It is advisable to set both of the properties at a project-wide level.

3.4. How Do We Set Maven Encoding on the Command Line?

We are able to set properties through command line arguments without adding any config to POM files. We might do this because we don’t have write-access to the pom.xml files.

Let’s run the following to specify the encoding that the build should use:

mvn clean install -Dproject.build.sourceEncoding=UTF-8 -Dproject.reporting.outputEncoding=UTF-8

Command-line arguments override any existing config.

Therefore, this allows us to run the build successfully even if we remove any encoding properties set in the pom.xml files.

4. Using Multiple Types of Encoding Within the Same Maven Project

It is a good idea to use using a single type of encoding across a project.

However, we might be forced to deal with multiple types of encoding in the same build. For example, our resource files may have different encoding systems, which may be beyond our control.

Is there a way we can do this? Well, it depends on the situation.

We saw that we could set encoding parameters on a plugin-by-plugin basis. Hence if we require our code in CP-1252 but want to output test results in UTF-8, then we are able to do this.

We’re even able to use multiple types of encoding within the same plugin by using different executions.

In particular, the maven-resources-plugin, which we saw earlier, has extra functionality built into it.

We saw the encoding parameter earlier. The plugin also provides a propertiesEncoding parameter to allow property files to be encoded in a different way from other resources:

<configuration>
    <encoding>UTF-8</encoding>
    <propertiesEncoding>ISO-8859-1</propertiesEncoding>
</configuration>

When the build is run using mvn clean install, this gives:

[INFO] --- maven-resources-plugin:3.2.0:resources (default-resources) @ maven-properties ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Using 'ISO-8859-1' encoding to copy filtered properties files.

It’s always worth referring to the technical documentation on maven.apache.org when investigating how a plugin can use an encoding.

5. Conclusion

In this article, we saw that declaring encoding helps ensure that the code builds in the same way in any environment.

We saw that we could set an encoding parameter at the plugin level.

Then, we learned that there are two properties that we can set at a project level. They are project.build.sourceEncoding and project.reporting.outputEncoding.

We also saw that it is possible to pass encoding in via the command line. This allows us to set the encoding type without editing the Maven POM files.

Finally, we looked at how we could approach using multiple types of encoding within the same project.

As always, the example project is available over on GitHub.