1. Introduction

Despite being one of the best-known vulnerabilities, SQL Injection continues to rank on the top spot of the infamous OWASP Top 10’s list – now part of the more general Injection class.

In this tutorial, we’ll explore common coding mistakes in Java that lead to a vulnerable application and how to avoid them using the APIs available in the JVM’s standard runtime library. We’ll also cover what protections we can get out of ORMs like JPA, Hibernate and others and which blind spots we’ll still have to worry about.

2. How Applications Become Vulnerable to SQL Injection?

Injection attacks work because, for many applications, the only way to execute a given computation is to dynamically generate code that is in turn run by another system or component. If in the process of generating this code we use untrusted data without proper sanitization, we leave an open door for hackers to exploit.

This statement may sound a bit abstract, so let’s take look at how this happens in practice with a textbook example:

public List<AccountDTO>
  unsafeFindAccountsByCustomerId(String customerId)
  throws SQLException {
    // UNSAFE !!! DON'T DO THIS !!!
    String sql = "select "
      + "customer_id,acc_number,branch_id,balance "
      + "from Accounts where customer_id = '"
      + customerId 
      + "'";
    Connection c = dataSource.getConnection();
    ResultSet rs = c.createStatement().executeQuery(sql);
    // ...
}

The problem with this code is obvious: we’ve put *the customerId‘s value into the query with no validation at all*. Nothing bad will happen if we’re sure that this value will only come from trusted sources, but can we?

Let’s imagine that this function is used in a REST API implementation for an account resource. Exploiting this code is trivial: all we have to do is to send a value that, when concatenated with the fixed part of the query, change its intended behavior:

curl -X GET \
  'http://localhost:8080/accounts?customerId=abc%27%20or%20%271%27=%271' \

Assuming the customerId parameter value goes unchecked until it reaches our function, here’s what we’d receive:

abc' or '1' = '1

When we join this value with the fixed part, we get the final SQL statement that will be executed:

select customer_id, acc_number,branch_id, balance
  from Accounts where customerId = 'abc' or '1' = '1'

Probably not what we’ve wanted…

A smart developer (aren’t we all?) would now be thinking: “That’s silly! I’d never use string concatenation to build a query like this”.

Not so fast… This canonical example is silly indeed but there are situations where we might still need to do it:

  • Complex queries with dynamic search criteria: adding UNION clauses depending on user-supplied criteria
  • Dynamic grouping or ordering: REST APIs used as a backend to a GUI data table

2.1. I’m Using JPA. I’m Safe, Right?

This is a common misconception. JPA and other ORMs relieves us from creating hand-coded SQL statements, but they won’t prevent us from writing vulnerable code.

Let’s see how the JPA version of the previous example looks:

public List<AccountDTO> unsafeJpaFindAccountsByCustomerId(String customerId) {    
    String jql = "from Account where customerId = '" + customerId + "'";        
    TypedQuery<Account> q = em.createQuery(jql, Account.class);        
    return q.getResultList()
      .stream()
      .map(this::toAccountDTO)
      .collect(Collectors.toList());        
}

The same issue we’ve pointed before is also present here: we’re using unvalidated input to create a JPA query, so we’re exposed to the same kind of exploit here.

3. Prevention Techniques

Now that we know what a SQL injection is, let’s see how we can protect our code from this kind of attack. Here we’re focusing on a couple of very effective techniques available in Java and other JVM languages, but similar concepts are available to other environments, such as PHP, .Net, Ruby and so forth.

For those looking for a complete list of available techniques, including database-specific ones, the OWASP Project maintains a SQL Injection Prevention Cheat Sheet, which is a good place to learn more about the subject.

3.1. Parameterized Queries

This technique consists of using prepared statements with the question mark placeholder (“?”) in our queries whenever we need to insert a user-supplied value. This is very effective and, unless there’s a bug in the JDBC driver’s implementation, immune to exploits.

Let’s rewrite our example function to use this technique:

public List<AccountDTO> safeFindAccountsByCustomerId(String customerId)
  throws Exception {
    
    String sql = "select "
      + "customer_id, acc_number, branch_id, balance from Accounts"
      + "where customer_id = ?";
    
    Connection c = dataSource.getConnection();
    PreparedStatement p = c.prepareStatement(sql);
    p.setString(1, customerId);
    ResultSet rs = p.executeQuery(sql)); 
    // omitted - process rows and return an account list
}

Here we’ve used the prepareStatement() method available in the Connection instance to get a PreparedStatement. This interface extends the regular Statement interface with several methods that allow us to safely insert user-supplied values in a query before executing it.

For JPA, we have a similar feature:

String jql = "from Account where customerId = :customerId";
TypedQuery<Account> q = em.createQuery(jql, Account.class)
  .setParameter("customerId", customerId);
// Execute query and return mapped results (omitted)

When running this code under Spring Boot, we can set the property logging.level.sql to DEBUG and see what query is actually built in order to execute this operation:

// Note: Output formatted to fit screen
[DEBUG][SQL] select
  account0_.id as id1_0_,
  account0_.acc_number as acc_numb2_0_,
  account0_.balance as balance3_0_,
  account0_.branch_id as branch_i4_0_,
  account0_.customer_id as customer5_0_ 
from accounts account0_ 
where account0_.customer_id=?

As expected, the ORM layer creates a prepared statement using a placeholder for the customerId parameter. This is the same we’ve done in the plain JDBC case – but with a few statements less, which is nice.

As a bonus, this approach usually results in a better performing query, since most databases can cache the query plan associated with a prepared statement.

Please note that this approach only works for placeholders used as values. For instance, we can’t use placeholders to dynamically change the name of a table:

// This WILL NOT WORK !!!
PreparedStatement p = c.prepareStatement("select count(*) from ?");
p.setString(1, tableName);

Here, JPA won’t help either:

// This WILL NOT WORK EITHER !!!
String jql = "select count(*) from :tableName";
TypedQuery q = em.createQuery(jql,Long.class)
  .setParameter("tableName", tableName);
return q.getSingleResult();

In both cases, we’ll get a runtime error.

The main reason behind this is the very nature of a prepared statement: database servers use them to cache the query plan required to pull the result set, which usually is the same for any possible value. This is not true for table names and other constructs available in the SQL language such as columns used in an order by clause.

3.2. JPA Criteria API

Since explicit JQL query building is the main source of SQL Injections, we should favor the use of the JPA’s Query API, when possible.

For a quick primer on this API, please refer to the article on Hibernate Criteria queries. Also worth reading is our article about JPA Metamodel, which shows how to generate metamodel classes that will help us to get rid of string constants used for column names – and the runtime bugs that arise when they change.

Let’s rewrite our JPA query method to use the Criteria API:

CriteriaBuilder cb = em.getCriteriaBuilder();
CriteriaQuery<Account> cq = cb.createQuery(Account.class);
Root<Account> root = cq.from(Account.class);
cq.select(root).where(cb.equal(root.get(Account_.customerId), customerId));

TypedQuery<Account> q = em.createQuery(cq);
// Execute query and return mapped results (omitted)

Here, we’ve used more code lines to get the same result, but the upside is that now we don’t have to worry about JQL syntax.

Another important point: despite its verbosity, the Criteria API makes creating complex query services more straightforward and safer. For a complete example that shows how to do it in practice, please take a look at the approach used by JHipster-generated applications.

3.3. User Data Sanitization

Data Sanitization is a technique of applying a filter to user supplied-data so it can be safely used by other parts of our application. A filter’s implementation may vary a lot, but we can generally classify them in two types: whitelists and blacklists.

Blacklists, which consist of filters that try to identify an invalid pattern, are usually of little value in the context of SQL Injection prevention – but not for the detection! More on this later.

Whitelists, on the other hand, work particularly well when we can define exactly what is a valid input.

Let’s enhance our safeFindAccountsByCustomerId method so now the caller can also specify the column used to sort the result set. Since we know the set of possible columns, we can implement a whitelist using a simple set and use it to sanitize the received parameter:

private static final Set<String> VALID_COLUMNS_FOR_ORDER_BY
  = Collections.unmodifiableSet(Stream
      .of("acc_number","branch_id","balance")
      .collect(Collectors.toCollection(HashSet::new)));

public List<AccountDTO> safeFindAccountsByCustomerId(
  String customerId,
  String orderBy) throws Exception { 
    String sql = "select "
      + "customer_id,acc_number,branch_id,balance from Accounts"
      + "where customer_id = ? ";
    if (VALID_COLUMNS_FOR_ORDER_BY.contains(orderBy)) {
        sql = sql + " order by " + orderBy;
    } else {
        throw new IllegalArgumentException("Nice try!");
    }
    Connection c = dataSource.getConnection();
    PreparedStatement p = c.prepareStatement(sql);
    p.setString(1,customerId);
    // ... result set processing omitted
}

Here, we’re combining the prepared statement approach and a whitelist used to sanitize the orderBy argument. The final result is a safe string with the final SQL statement. In this simple example, we’re using a static set, but we could also have used database metadata functions to create it.

We can use the same approach for JPA, also taking advantage of the Criteria API and Metadata to avoid using String constants in our code:

// Map of valid JPA columns for sorting
final Map<String,SingularAttribute<Account,?>> VALID_JPA_COLUMNS_FOR_ORDER_BY = Stream.of(
  new AbstractMap.SimpleEntry<>(Account_.ACC_NUMBER, Account_.accNumber),
  new AbstractMap.SimpleEntry<>(Account_.BRANCH_ID, Account_.branchId),
  new AbstractMap.SimpleEntry<>(Account_.BALANCE, Account_.balance))
  .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));

SingularAttribute<Account,?> orderByAttribute = VALID_JPA_COLUMNS_FOR_ORDER_BY.get(orderBy);
if (orderByAttribute == null) {
    throw new IllegalArgumentException("Nice try!");
}

CriteriaBuilder cb = em.getCriteriaBuilder();
CriteriaQuery<Account> cq = cb.createQuery(Account.class);
Root<Account> root = cq.from(Account.class);
cq.select(root)
  .where(cb.equal(root.get(Account_.customerId), customerId))
  .orderBy(cb.asc(root.get(orderByAttribute)));

TypedQuery<Account> q = em.createQuery(cq);
// Execute query and return mapped results (omitted)

This code has the same basic structure as in the plain JDBC. First, we use a whitelist to sanitize the column name, then we proceed to create a CriteriaQuery to fetch the records from the database.

3.4. Are We Safe Now?

Let’s assume that we’ve used parameterized queries and/or whitelists everywhere. Can we now go to our manager and guarantee we’re safe?

Well… not so fast. Without even considering Turing’s halting problem, there are other aspects we must consider:

  1. Stored Procedures: These are also prone to SQL Injection issues; whenever possible please apply sanitation even to values that will be sent to the database via prepared statements
  2. Triggers: Same issue as with procedure calls, but even more insidious because sometimes we have no idea they’re there…
  3. Insecure Direct Object References: Even if our application is SQL-Injection free, there’s still a risk that associated with this vulnerability category – the main point here is related to different ways an attacker can trick the application, so it returns records he or she was not supposed to have access to – there’s a good cheat sheet on this topic available at OWASP’s GitHub repository

In short, our best option here is caution. Many organizations nowadays use a “red team” exactly for this. Let them do their job, which is exactly to find any remaining vulnerabilities.

4. Damage Control Techniques

As a good security practice, we should always implement multiple defense layers – a concept known as defense in depth. The main idea is that even if we’re unable to find all possible vulnerabilities in our code – a common scenario when dealing with legacy systems – we should at least try to limit the damage an attack would inflict.

Of course, this would be a topic for a whole article or even a book but let’s name a few measures:

  1. Apply the principle of least privilege: Restrict as much as possible the privileges of the account used to access the database
  2. Use database-specific methods available in order to add an additional protection layer; for example, the H2 Database has a session-level option that disables all literal values on SQL Queries
  3. Use short-lived credentials: Make the application rotate database credentials often; a good way to implement this is by using Spring Cloud Vault
  4. Log everything: If the application stores customer data, this is a must; there are many solutions available that integrate directly to the database or work as a proxy, so in case of an attack we can at least assess the damage
  5. Use WAFs or similar intrusion detection solutions: those are the typical blacklist examples – usually, they come with a sizeable database of known attack signatures and will trigger a programmable action upon detection. Some also include in-JVM agents that can detect intrusions by applying some instrumentation – the main advantage of this approach is that an eventual vulnerability becomes much easier to fix since we’ll have a full stack trace available.

5. Conclusion

In this article, we’ve covered SQL Injection vulnerabilities in Java applications –  a very serious threat to any organization that depends on data for their business – and how to prevent them using simple techniques.

As usual, full code for this article is available on Github.