1. Introduction

UTF-8 is the most common character encoding used in web applications. It supports all languages currently spoken in the world including Chinese, Korean, and Japanese.

In this article, we demonstrate all configuration needed to ensure UTF-8 in Tomcat.

2. Connector Configuration

A Connector listens for connections on a specific port. We need to make sure that all of our Connectors use UTF-8 to encode requests.

Let’s add the parameter URIEncoding=”UTF-8″ to all the Connectors in TOMCAT_ROOT/conf/server.xml:

<Connector 
  URIEncoding="UTF-8" 
  port="8080" 
  redirectPort="8443" 
  connectionTimeout="20000" 
  protocol="HTTP/1.1"/>

<Connector 
  URIEncoding="UTF-8" 
  port="8009" 
  redirectPort="8443" 
  protocol="AJP/1.3"/>

3. Character Set Filter

After configuring the connector, it’s time to force the web application to handle all requests and responses in UTF-8.

Let’s define a class named CharacterSetFilter:

public class CharacterSetFilter implements Filter {

    // ...

    public void doFilter(
      ServletRequest request, 
      ServletResponse response, 
      FilterChain next) throws IOException, ServletException {
        request.setCharacterEncoding("UTF-8");
        response.setContentType("text/html; charset=UTF-8");
        response.setCharacterEncoding("UTF-8");
        next.doFilter(request, response);
    }

    // ...
}

We need to add the filter to our application’s web.xml so that it’s applied to all requests and responses:

<filter>
    <filter-name>CharacterSetFilter</filter-name>
    <filter-class>com.baeldung.CharacterSetFilter</filter-class>
</filter>

<filter-mapping>
    <filter-name>CharacterSetFilter</filter-name>
    <url-pattern>/*</url-pattern>
</filter-mapping>

4. Server Page Encoding

The other part of our web application we need to configure is Java server pages.

The best way to ensure UTF-8 in server pages is to add this tag at the top of each JSP page:

<%@page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%>

5. HTML Page Encoding

While server page encoding tells JVM how to handle page characters, HTML page encoding tells the browser how to handle page characters.

We should add this tag in the head section of all HTML pages:

<meta http-equiv='Content-Type' content='text/html; charset=UTF-8' />

6. MySQL Server Configuration

Now, that our Tomcat is configured, it’s time to configure the database.

We assume that a MySQL server is used. The configuration file is named my.ini on Windows and my.cnf on Linux.

We need to find the configuration file, search for these parameters, and edit them accordingly:

[client]
default-character-set = utf8mb4

[mysql]
default-character-set = utf8mb4

[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci

We need to restart MySQL server for the changes to take effect.

7. MySQL Database Configuration

MySQL server character set configuration is only applied to new databases. We need to migrate old ones manually. This can be easily achieved using a few commands.

For each database:

ALTER DATABASE database_name CHARACTER SET = utf8mb4 
    COLLATE = utf8mb4_unicode_ci;

For each table:

ALTER TABLE table_name CONVERT TO 
    CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

For each VARCHAR or TEXT column:

ALTER TABLE table_name CHANGE column_name column_name 
    VARCHAR(69) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

If we’re passing data with UTF-8 characters in database queries, we need to ensure that any database connection made is UTF-8 encoding compliant.

For JDBC based connection this can be achieved with following connection URL:

jdbc:mysql://localhost:3306/?useUnicode=yes;characterEncoding=UTF-8

8. Conclusion

In this article, we demonstrated how to ensure Tomcat uses the UTF-8 encoding.