在Elasticsearch查询中添加聚合

1. Overview

Elasticsearch is a search and analytics engine suitable for scenarios requiring flexible filtering. Sometimes, we need to retrieve the requested data and its aggregated information. In this tutorial, we’ll explore how we can do this.

2. Elasticsearch Search With Aggregation

Let’s begin by exploring Elasticsearch’s aggregation functionality.

Once we have an Elasticsearch instance running on localhost, let’s create an index named store-items with a few documents in it:

POST http://localhost:9200/store-items/_doc
{
    "type": "Multimedia",
    "name": "PC Monitor",
    "price": 1000
}
...
POST http://localhost:9200/store-items/_doc
{
    "type": "Pets",
    "name": "Dog Toy",
    "price": 10
}

Now, let’s query it without applying any filters:

GET http://localhost:9200/store-items/_search

Now let’s take a look at the response:

{
...
    "hits": {
        "total": {
            "value": 5,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "store-items",
                "_type": "_doc",
                "_id": "J49VVI8B6ADL84Kpbm8A",
                "_score": 1.0,
                "_source": {
                    "_class": "com.baeldung.model.StoreItem",
                    "type": "Multimedia",
                    "name": "PC Monitor",
                    "price": 1000
                }
            },
            {
                "_index": "store-items",
                "_type": "_doc",
                "_id": "KI9VVI8B6ADL84Kpbm8A",
                "_score": 1.0,
                "_source": {
                    "type": "Pets",
                    "name": "Dog Toy",
                    "price": 10
                }
            },
 ...
        ]
    }
}

We have a few documents related to store items in the response. Each document corresponds to a specific type of store item.

Next, let’s say we want to know how many items we have for each type. Let’s add the aggregation section to the request body and search the index again:

GET http://localhost:9200/store-items/_search
{
    "aggs": {
        "type_aggregation": {
            "terms": {
                "field": "type"
            }
        }
    }
}

We’ve added the aggregation named type_aggregation that uses the terms aggregation.

As we can see in the response, there is a new aggregations section where we can find information about the number of documents for each type:

{
...
    "aggregations": {
        "type_aggregation": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "Multimedia",
                    "doc_count": 2
                },
                {
                    "key": "Pets",
                    "doc_count": 2
                },
                {
                    "key": "Home tech",
                    "doc_count": 1
                }
            ]
        }
    }
}

3. Spring Data Elasticsearch Search With Aggregation

Let’s implement the functionality from the previous section using Spring Data Elasticsearch. Let’s begin by adding the dependency:

<dependency>
    <groupId>org.springframework.data</groupId>
    <artifactId>spring-data-elasticsearch</artifactId>
</dependency>

In the next step, we provide an Elasticsearch configuration class:

@Configuration
@EnableElasticsearchRepositories(basePackages = "com.baeldung.spring.data.es.aggregation.repository")
@ComponentScan(basePackages = "com.baeldung.spring.data.es.aggregation")
public class ElasticSearchConfig {

    @Bean
    public RestClient elasticsearchRestClient() {
        return RestClient.builder(HttpHost.create("localhost:9200"))
          .setHttpClientConfigCallback(httpClientBuilder -> {
              httpClientBuilder.addInterceptorLast((HttpResponseInterceptor) (response, context) ->
                  response.addHeader("X-Elastic-Product", "Elasticsearch"));
              return httpClientBuilder;
            })
          .build();
    }

    @Bean
    public ElasticsearchClient elasticsearchClient(RestClient restClient) {
        return ElasticsearchClients.createImperative(restClient);
    }

    @Bean(name = { "elasticsearchOperations", "elasticsearchTemplate" })
    public ElasticsearchOperations elasticsearchOperations(
        ElasticsearchClient elasticsearchClient) {

        ElasticsearchTemplate template = new ElasticsearchTemplate(elasticsearchClient);
        template.setRefreshPolicy(null);

        return template;
    }
}

Here we’ve specified a low-level Elasticsearch REST client and its wrapper bean implementing the ElasticsearchOperations interface. Now, let’s create a StoreItem entity:

@Document(indexName = "store-items")
public class StoreItem {
    @Id
    private String id;

    @Field(type = Keyword)
    private String type;
    @Field(type = Keyword)
    private String name;

    @Field(type = Keyword)
    private Long price;

    //getters and setters
}

We’ve utilized the same store-items index as in the last section. Since we cannot use the built-in abilities of the Spring Data repository to retrieve aggregations, we’ll need to create a repository extension. Let’s create an extension interface:

public interface StoreItemRepositoryExtension {
    SearchPage<StoreItem> findAllWithAggregations(Pageable pageable);
}

Here we have the findAllWithAggregations() method, which consumes a Pageable interface implementation and returns a SearchPage with our items. Next, let’s create an implementation of this interface:

@Component
public class StoreItemRepositoryExtensionImpl implements StoreItemRepositoryExtension {

    @Autowired
    private ElasticsearchOperations elasticsearchOperations;

    @Override
    public SearchPage<StoreItem> findAllWithAggregations(Pageable pageable) {
        Query query = NativeQuery.builder()
          .withAggregation("type_aggregation",
            Aggregation.of(b -> b.terms(t -> t.field("type"))))
          .build();
        SearchHits<StoreItem> response = elasticsearchOperations.search(query, StoreItem.class);
        return SearchHitSupport.searchPageFor(response, pageable);
    }
}

We’ve constructed the native query, incorporating the aggregation section. Following the pattern from the previous section, we use type_aggregation as the aggregation name. Then, we utilize the terms aggregation type to calculate the number of documents per specified field in the response.

Finally, let’s create a Spring Data repository where we’ll extend ElasticsearchRepository to support generic Spring Data functionality and StoreItemRepositoryExtension to incorporate our custom method implementation:

@Repository
public interface StoreItemRepository extends ElasticsearchRepository<StoreItem, String>,
  StoreItemRepositoryExtension {
}

After that, let’s create a test for our aggregation functionality:

@ExtendWith(SpringExtension.class)
@ContextConfiguration(classes = ElasticSearchConfig.class)
public class ElasticSearchAggregationManualTest {

    private static final List<StoreItem> EXPECTED_ITEMS = List.of(
      new StoreItem("Multimedia", "PC Monitor", 1000L),
      new StoreItem("Multimedia", "Headphones", 200L), 
      new StoreItem("Home tech", "Barbecue Grill", 2000L), 
      new StoreItem("Pets", "Dog Toy", 10L),
      new StoreItem("Pets", "Cat shampoo", 5L));
...

    @BeforeEach
    public void before() {
        repository.saveAll(EXPECTED_ITEMS);
    }

...
}

We’ve created a test data set with five items, featuring a few store items for each type. We populate this data in Elasticsearch before our test case starts executing. Moving on, let’s call our findAllWithAggregations() method and see what it returns:

@Test
void givenFullTitle_whenRunMatchQuery_thenDocIsFound() {
    SearchHits<StoreItem> searchHits = repository.findAllWithAggregations(Pageable.ofSize(2))
      .getSearchHits();
    List<StoreItem> data = searchHits.getSearchHits()
      .stream()
      .map(SearchHit::getContent)
      .toList();

    Assertions.assertThat(data).containsAll(EXPECTED_ITEMS);

    Map<String, Long> aggregatedData = ((ElasticsearchAggregations) searchHits
      .getAggregations())
      .get("type_aggregation")
      .aggregation()
      .getAggregate()
      .sterms()
      .buckets()
      .array()
      .stream()
      .collect(Collectors.toMap(bucket -> bucket.key()
        .stringValue(), MultiBucketBase::docCount));

    Assertions.assertThat(aggregatedData).containsExactlyInAnyOrderEntriesOf(
      Map.of("Multimedia", 2L, "Home tech", 1L, "Pets", 2L));
}

As we can see in the response, we’ve retrieved search hits from which we can extract the exact query results. Additionally, we retrieved the aggregation data, which contains all the expected aggregations for our search results.

4. Conclusion

In this article, we’ve explored how to integrate Elasticsearch aggregation functionality into Spring Data repositories. We utilized the terms aggregation to do this. However, there are many other types of aggregations available that we can employ to cover a wide range of aggregation functionality.

As usual, the full source code can be found over on GitHub.

Persistence

REST

Security

1. Overview

2. Elasticsearch Search With Aggregation

3. Spring Data Elasticsearch Search With Aggregation

4. Conclusion