1. Overview
In this article, we’ll discuss the basics of Hibernate Search, how to configure it, and we’ll implement some simple queries.
2. Basics of Hibernate Search
Whenever we have to implement full-text search functionality, using tools we’re already well-versed with is always a plus.
In case we’re already using Hibernate and JPA for ORM, we’re only one step away from Hibernate Search.
Hibernate Search integrates Apache Lucene, a high-performance and extensible full-text search-engine library written in Java. This combines the power of Lucene with the simplicity of Hibernate and JPA.
Simply put, we just have to add some additional annotations to our domain classes, and the tool will take care of the things like database/index synchronization.
Hibernate Search also provides an Elasticsearch integration; however, as it’s still in an experimental stage, we’ll focus on Lucene here.
3. Configurations
3.1. Maven Dependencies
Before getting started, we first need to add the necessary dependencies to our pom.xml:
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-search-orm</artifactId>
<version>5.8.2.Final</version>
</dependency>
For the sake of simplicity, we’ll use H2 as our database:
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<version>1.4.196</version>
</dependency>
3.2. Configurations
We also have to specify where Lucene should store the index.
This can be done via the property hibernate.search.default.directory_provider.
We’ll choose filesystem, which is the most straightforward option for our use case. More options are listed in the official documentation. Filesystem-master/filesystem-slave and infinispan are noteworthy for clustered applications, where the index has to be synchronized between nodes.
We also have to define a default base directory where indexes will be stored:
hibernate.search.default.directory_provider = filesystem
hibernate.search.default.indexBase = /data/index/default
4. The Model Classes
After the configuration, we’re now ready to specify our model.
On top of the JPA annotations @Entity and @Table, we have to add an @Indexed annotation. It tells Hibernate Search that the entity Product shall be indexed.
After that, we have to define the required attributes as searchable by adding a @Field annotation:
@Entity
@Indexed
@Table(name = "product")
public class Product {
@Id
private int id;
@Field(termVector = TermVector.YES)
private String productName;
@Field(termVector = TermVector.YES)
private String description;
@Field
private int memory;
// getters, setters, and constructors
}
The termVector = TermVector.YES attribute will be required for the “More Like This” query later.
5. Building the Lucene Index
Before starting the actual queries, we have to trigger Lucene to build the index initially:
FullTextEntityManager fullTextEntityManager
= Search.getFullTextEntityManager(entityManager);
fullTextEntityManager.createIndexer().startAndWait();
After this initial build, Hibernate Search will take care of keeping the index up to date. I. e. we can create, manipulate and delete entities via the EntityManager as usual.
Note: we have to make sure that entities are fully committed to the database before they can be discovered and indexed by Lucene (by the way, this also the reason why the initial test data import in our example code test cases comes in a dedicated JUnit test case, annotated with @Commit).
6. Building and Executing Queries
Now, we’re ready for creating our first query.
In the following section, we’ll show the general workflow for preparing and executing a query.
After that, we’ll create some example queries for the most important query types.
6.1. General Workflow for Creating and Executing a Query
Preparing and executing a query in general consists of four steps:
In step 1, we have to get a JPA FullTextEntityManager and from that a QueryBuilder:
FullTextEntityManager fullTextEntityManager
= Search.getFullTextEntityManager(entityManager);
QueryBuilder queryBuilder = fullTextEntityManager.getSearchFactory()
.buildQueryBuilder()
.forEntity(Product.class)
.get();
In step 2, we will create a Lucene query via the Hibernate query DSL:
org.apache.lucene.search.Query query = queryBuilder
.keyword()
.onField("productName")
.matching("iphone")
.createQuery();
In step 3, we’ll wrap the Lucene query into a Hibernate query:
org.hibernate.search.jpa.FullTextQuery jpaQuery
= fullTextEntityManager.createFullTextQuery(query, Product.class);
Finally, in step 4 we’ll execute the query:
List<Product> results = jpaQuery.getResultList();
Note: by default, Lucene sorts the results by relevance.
Steps 1, 3 and 4 are the same for all query types.
In the following, we will focus on step 2, i. e. how to create different types of queries.
6.2. Keyword Queries
The most basic use-case is searching for a specific word.
This is what we actually did already in the previous section:
Query keywordQuery = queryBuilder
.keyword()
.onField("productName")
.matching("iphone")
.createQuery();
Here, keyword() specifies that we are looking for one specific word, onField() tells Lucene where to look and matching() what to look for.
6.3. Fuzzy Queries
Fuzzy queries are working like keyword queries, except that we can define a limit of “fuzziness”, above which Lucene shall accept the two terms as matching.
By withEditDistanceUpTo(), we can define how much a term may deviate from the other. It can be set to 0, 1, and 2, whereby the default value is 2 (note: this limitation is coming from the Lucene’s implementation).
By withPrefixLength(), we can define the length of the prefix which shall be ignored by the fuzziness:
Query fuzzyQuery = queryBuilder
.keyword()
.fuzzy()
.withEditDistanceUpTo(2)
.withPrefixLength(0)
.onField("productName")
.matching("iPhaen")
.createQuery();
6.4. Wildcard Queries
Hibernate Search also enables us to execute wildcard queries, i. e. queries for which a part of a word is unknown.
For this, we can use “*?”* for a single character, and “*” for any character sequence:
Query wildcardQuery = queryBuilder
.keyword()
.wildcard()
.onField("productName")
.matching("Z*")
.createQuery();
6.5. Phrase Queries
If we want to search for more than one word, we can use phrase queries. We can either look for exact or for approximate sentences, using phrase() and withSlop(), if necessary. The slop factor defines the number of other words permitted in the sentence:
Query phraseQuery = queryBuilder
.phrase()
.withSlop(1)
.onField("description")
.sentence("with wireless charging")
.createQuery();
6.6. Simple Query String Queries
With the previous query types, we had to specify the query type explicitly.
If we want to give some more power to the user, we can use simple query string queries: by that, he can define his own queries at runtime.
The following query types are supported:
- boolean (AND using “+”, OR using “|”, NOT using “-“)
- prefix (prefix*)
- phrase (“some phrase”)
- precedence (using parentheses)
- fuzzy (fuzy~2)
- near operator for phrase queries (“some phrase”~3)
The following example would combine fuzzy, phrase and boolean queries:
Query simpleQueryStringQuery = queryBuilder
.simpleQueryString()
.onFields("productName", "description")
.matching("Aple~2 + \"iPhone X\" + (256 | 128)")
.createQuery();
6.7. Range Queries
Range queries search for a value in between given boundaries. This can be applied to numbers, dates, timestamps, and strings:
Query rangeQuery = queryBuilder
.range()
.onField("memory")
.from(64).to(256)
.createQuery();
6.8. More Like This Queries
Our last query type is the “More Like This” – query. For this, we provide an entity, and Hibernate Search returns a list with similar entities, each with a similarity score.
As mentioned before, the termVector = TermVector.YES attribute in our model class is required for this case: it tells Lucene to store the frequency for each term during indexing.
Based on this, the similarity will be calculated at query execution time:
Query moreLikeThisQuery = queryBuilder
.moreLikeThis()
.comparingField("productName").boostedTo(10f)
.andField("description").boostedTo(1f)
.toEntity(entity)
.createQuery();
List<Object[]> results = (List<Object[]>) fullTextEntityManager
.createFullTextQuery(moreLikeThisQuery, Product.class)
.setProjection(ProjectionConstants.THIS, ProjectionConstants.SCORE)
.getResultList();
6.9. Searching More Than One Field
Until now, we only created queries for searching one attribute, using onField().
Depending on the use case, we can also search two or more attributes:
Query luceneQuery = queryBuilder
.keyword()
.onFields("productName", "description")
.matching(text)
.createQuery();
Moreover, we can specify each attribute to be searched separately, e. g. if we want to define a boost for one attribute:
Query moreLikeThisQuery = queryBuilder
.moreLikeThis()
.comparingField("productName").boostedTo(10f)
.andField("description").boostedTo(1f)
.toEntity(entity)
.createQuery();
6.10. Combining Queries
Finally, Hibernate Search also supports combining queries using various strategies:
- SHOULD: the query should contain the matching elements of the subquery
- MUST: the query must contain the matching elements of the subquery
- MUST NOT: the query must not contain the matching elements of the subquery
The aggregations are similar to the boolean ones AND, OR and NOT. However, the names are different to emphasize that they also have an impact on the relevance.
For example, a SHOULD between two queries is similar to boolean OR: if one of the two queries has a match, this match will be returned.
However, if both queries match, the match will have a higher relevance compared to if only one query matches:
Query combinedQuery = queryBuilder
.bool()
.must(queryBuilder.keyword()
.onField("productName").matching("apple")
.createQuery())
.must(queryBuilder.range()
.onField("memory").from(64).to(256)
.createQuery())
.should(queryBuilder.phrase()
.onField("description").sentence("face id")
.createQuery())
.must(queryBuilder.keyword()
.onField("productName").matching("samsung")
.createQuery())
.not()
.createQuery();
7. Conclusion
In this article, we discussed the basics of Hibernate Search and showed how to implement the most important query types. More advanced topics can be found it the official documentation.
As always, the full source code of the examples is available over on GitHub.