Skip Select Before Insert in Spring Data JPA | Baeldung

1. Overview

In some cases, when we’re saving an entity using a Spring Data JPA Repository, we may encounter an additional SELECT in the logs. This may cause performance issues due to numerous extra calls.

In this tutorial, we’ll explore a few methods to skip SELECT in logs and improve performance.

2. Setup

Before diving into Spring Data JPA and testing it, there are a few preparatory steps we need to take.

2.1. Dependencies

To create our test repositories we’ll use Spring Data JPA dependency:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>

As a test database, we’ll use the H2 Database. Let’s add its dependency:

<dependency>
    <groupId>com.h2database</groupId>
    <artifactId>h2</artifactId>
</dependency>

In our integrational tests, we’ll use a test Spring Context. Let’s add the spring-boot-starter-test dependency:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-test</artifactId>
    <scope>test</scope>
</dependency>

2.2. Configuration

Here is the JPA configuration we’ll use in our example:

spring.jpa.hibernate.dialect=org.hibernate.dialect.H2Dialect
spring.jpa.hibernate.ddl-auto=create-drop
spring.jpa.hibernate.show_sql=true
spring.jpa.hibernate.hbm2ddl.auto=create-drop

According to this configuration, we’ll let Hibernate generate the schema and log all the SQL queries into the log.

3. The Reason for the SELECT Query

First things first, let’s create an entity:

@Entity
public class Task {

    @Id
    private Integer id;
    private String description;

    //getters and setters
}

Now, let’s create a repository for this entity:

@Repository
public interface TaskRepository extends JpaRepository<Task, Integer> {
}

Now, let’s save a new Task specifying the ID:

@Autowired
private TaskRepository taskRepository;

@Test
void givenRepository_whenSaveNewTaskWithPopulatedId_thenExtraSelectIsExpected() {
    Task task = new Task();
    task.setId(1);
    taskRepository.saveAndFlush(task);
}

When we call the saveAndFlush() – the behavior for the save() method will be the same – method of our repository, internally we use this code:

public<S extends T> S save(S entity){
    if(isNew(entity)){
        entityManager.persist(entity);
        return entity;
    } else {
        return entityManager.merge(entity);
    }
}

So, if our entity is considered as not new, we’ll call the merge() method of the entity manager. Inside merge() JPA checks if our entity is present in a cache and persistence context. Since our object is new it’ll not be found there. Finally, it tries to load the entity from the data source.

This is the point where we come across the SELECT query in the logs. Since we don’t have such an item in the database, we invoke the INSERT query after that:

Hibernate: select task0_.id as id1_1_0_, task0_.description as descript2_1_0_ from task task0_ where task0_.id=?
Hibernate: insert into task (id, description) values (default, ?)

If we specify the ID on the application side, our entity will be considered new. An extra SELECT query will be sent to the database in that case.

4. Use @GeneratedValue

One of the possible solutions is to not specify the ID on the application side. We can use @GeneratedValue annotation and specify a strategy that’ll be used to generate ID on the database side.

Let’s specify the generation strategy for our TaskWithGeneratedId ID*:*

@Entity
public class TaskWithGeneratedId {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Integer id;
}

Then, we save an instance of the TaskWithGeneratedId entity, but now we don’t set the ID:

@Autowired
private TaskWithGeneratedIdRepository taskWithGeneratedIdRepository;

@Test
void givenRepository_whenSaveNewTaskWithGeneratedId_thenNoExtraSelectIsExpected() {
    TaskWithGeneratedId task = new TaskWithGeneratedId();
    TaskWithGeneratedId saved = taskWithGeneratedIdRepository.saveAndFlush(task);
    assertNotNull(saved.getId());
}

As we can see in the logs, there are no SELECT queries in the logs and a new ID was generated for the entity.

5. Implement Persistable

Another option we have is to implement the Persistable interface in our entity:

@Entity
public class PersistableTask implements Persistable<Integer> {
    @Id
    private int id;

    @Transient
    private boolean isNew = true;

    @Override
    public Integer getId() {
        return id;
    }

    @Override
    public boolean isNew() {
        return isNew;
    }
    
    //getters and setters
}

Here we’ve added a new field isNew and annotated it as @Transient to not create a column in the base. Using the overridden isNew() method we can consider our entity as new even though we have an ID specified.

Now, under the hood, JPA uses another logic to consider if an entity is new or not:

public class JpaPersistableEntityInformation {
    public boolean isNew(T entity) {
        return entity.isNew();
    }
}

Let’s save our PersistableTask using the PersistableTaskRepository:

As we can see, we’ll have only the INSERT log message and the entity contains the ID we specified.

If we try to save a few new entities with the same ID, we encounter an exception:

@Test
void givenRepository_whenSaveNewPersistableTasksWithSameId_thenExceptionIsExpected() {
    PersistableTask persistableTask = new PersistableTask();
    persistableTask.setId(3);
    persistableTask.setNew(true);
    persistableTaskRepository.saveAndFlush(persistableTask);

    PersistableTask duplicateTask = new PersistableTask();
    duplicateTask.setId(3);
    duplicateTask.setNew(true);

    assertThrows(DataIntegrityViolationException.class,
      () -> persistableTaskRepository.saveAndFlush(duplicateTask));
}

So, if we take the responsibility to generate the IDs, we also should take care of their uniqueness.

6. Use persist() Method Directly

As we saw in previous examples, all the actions we did led us to call the persist() method. We also can create an extension for our repository that allows us to call this method directly.

Let’s create an interface with the persist() method:

public interface TaskRepositoryExtension {
    Task persistAndFlush(Task task);
}

Then, let’s make an implementation bean of this interface:

@Component
public class TaskRepositoryExtensionImpl implements TaskRepositoryExtension {
    @PersistenceContext
    private EntityManager entityManager;

    @Override
    public Task persistAndFlush(Task task) {
        entityManager.persist(task);
        entityManager.flush();
        return task;
    }
}

Now, we extend our TaskRepository using a new interface:

@Repository
public interface TaskRepository extends JpaRepository<Task, Integer>, TaskRepositoryExtension {
}

Let’s call our custom persistAndFlush() method to save the Task instance:

@Test
void givenRepository_whenPersistNewTaskUsingCustomPersistMethod_thenNoExtraSelectIsExpected() {
    Task task = new Task();
    task.setId(4);
    Task saved = taskRepository.persistAndFlush(task);

    assertEquals(4, saved.getId());
}

We can see the log message with an INSERT call and no extra SELECT calls.

7. Use BaseJpaRepository From Hypersistence Utils

The idea from the previous section was already implemented in the Hypersistence Utils project. This project provides us a BaseJpaRepository where we have the persistAndFlush() method implementation as well as its batch analog.

To use it, we have to specify additional dependencies. We should choose a correct Maven artifact based on our Hibernate version:

<dependency>
    <groupId>io.hypersistence</groupId>
    <artifactId>hypersistence-utils-hibernate-55</artifactId>
</dependency>

Let’s implement another repository, that extends both BaseJpaRepository from Hypersistence Utils and JpaRepository from Spring Data JPA:

@Repository
public interface TaskJpaRepository extends JpaRepository<Task, Integer>, BaseJpaRepository<Task, Integer> {
}

Also, we have to enable the implementation of BaseJpaRepository using @EnableJpaRepositories annotation:

@EnableJpaRepositories(
    repositoryBaseClass = BaseJpaRepositoryImpl.class
)

Now, let’s save our Task using our new repository:

@Autowired
private TaskJpaRepository taskJpaRepository;

@Test
void givenRepository_whenPersistNewTaskUsingPersist_thenNoExtraSelectIsExpected() {
    Task task = new Task();
    task.setId(5);
    Task saved = taskJpaRepository.persistAndFlush(task);

    assertEquals(5, saved.getId());
}

We have our Task saved and there are no SELECT queries in the log.

Like in all the examples where we specified ID on the application side, there can be unique constraints violations:

@Test
void givenRepository_whenPersistTaskWithTheSameId_thenExceptionIsExpected() {
    Task task = new Task();
    task.setId(5);
    taskJpaRepository.persistAndFlush(task);

    Task secondTask = new Task();
    secondTask.setId(5);

    assertThrows(DataIntegrityViolationException.class,
      () ->  taskJpaRepository.persistAndFlush(secondTask));
}

8. Use @Query Annotated Method

We also can avoid extra calls using modifying native queries directly. Let’s specify a such method in our TaskRepository:

@Repository
public interface TaskRepository extends JpaRepository<Task, Integer> {

    @Modifying
    @Query(value = "insert into task(id, description) values(:#{#task.id}, :#{#task.description})", 
      nativeQuery = true)
    void insert(@Param("task") Task task);
}

This method calls the INSERT query directly avoiding the work with persistence context. The ID will be taken from the Task object sent in the method parameters.

Now let’s save our Task using this method:

@Test
void givenRepository_whenPersistNewTaskUsingNativeQuery_thenNoExtraSelectIsExpected() {
    Task task = new Task();
    task.setId(6);
    taskRepository.insert(task);

    assertTrue(taskRepository.findById(6).isPresent());
}

The entity was successfully saved using the ID without extra SELECT queries before INSERT. We should consider, that by using this method we avoid a JPA context and Hibernate cache.

9. Conclusion

When implementing ID generation on the application side using Spring Data JPA, we may encounter occurrences of additional SELECT queries in the logs, leading to performance degradation. In this article, we’ve discussed various strategies to address this issue.

In some cases, it makes sense to move this logic to the database side or fine-tune the persistence logic according to our needs. We should take into account the pros, cons, and potential issues of each strategy before making a decision.

As usual, the full source code can be found over on GitHub.

Persistence

REST

Security