1. 概述

本文将重点介绍如何使用Java列出S3存储桶中的所有对象。我们将讨论使用AWS SDK for Java与S3交互的方法,并针对不同场景提供示例代码。

本文将主要使用AWS SDK for Java V2,该版本相较于前代有显著改进,包括:

  • ✅ 性能增强
  • ✅ 非阻塞I/O
  • ✅ 更友好的API设计

2. 前置条件

要列出S3存储桶中的对象,我们需要使用AWS SDK提供的S3Client类。首先创建Java项目并添加Maven依赖:

<dependency>
    <groupId>software.amazon.awssdk</groupId>
    <artifactId>s3</artifactId>
    <version>2.24.9</version>
</dependency>

⚠️ 本文示例使用2.20.52版本,最新版本可在Maven仓库查看。

还需要完成以下配置:

  1. 设置AWS账户
  2. 安装AWS CLI
  3. 配置AWS凭证(AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEY
  4. 创建S3存储桶并上传文件

示例中使用的存储桶名为baeldung-tutorials-s3,包含1060个文件:

S3存储桶对象

3. 列出S3存储桶中的对象

使用AWS SDK V2创建读取存储桶对象的方法:

String AWS_BUCKET = "baeldung-tutorial-s3"; 
Region AWS_REGION = Region.EU_CENTRAL_1;
void listObjectsInBucket() {
    S3Client s3Client = S3Client.builder()
      .region(AWS_REGION)
      .build();

    ListObjectsV2Request listObjectsV2Request = ListObjectsV2Request.builder()
      .bucket(AWS_BUCKET)
      .build();
    ListObjectsV2Response listObjectsV2Response = s3Client.listObjectsV2(listObjectsV2Request);

    List<S3Object> contents = listObjectsV2Response.contents();

    System.out.println("Number of objects in the bucket: " + contents.stream().count());
    contents.stream().forEach(System.out::println);
    
    s3Client.close();
}

执行结果:

Number of objects in the bucket: 1000
S3Object(Key=file_0.txt, LastModified=2023-06-06T11:35:06Z, ETag="b9ece18c950afbfa6b0fdbfa4ff731d3", Size=1, StorageClass=STANDARD)
S3Object(Key=file_1.txt, LastModified=2023-06-06T11:35:07Z, ETag="97a6dd4c45b23db9c5d603ce161b8cab", Size=1, StorageClass=STANDARD)
[...]

踩坑提醒:此方法最多返回1000个对象!如果存储桶包含超过1000个对象,必须使用分页处理。

4. 使用延续令牌分页

当存储桶对象超过1000个时,需要使用nextContinuationToken()实现分页:

void listAllObjectsInBucket() {
    S3Client s3Client = S3Client.builder()
      .region(AWS_REGION)
      .build();
    String nextContinuationToken = null;
    long totalObjects = 0;

    do {
        ListObjectsV2Request.Builder requestBuilder = ListObjectsV2Request.builder()
          .bucket(AWS_BUCKET)
          .continuationToken(nextContinuationToken);

        ListObjectsV2Response response = s3Client.listObjectsV2(requestBuilder.build());
        nextContinuationToken = response.nextContinuationToken();

        totalObjects += response.contents().stream()
          .peek(System.out::println)
          .reduce(0, (subtotal, element) -> subtotal + 1, Integer::sum);
    } while (nextContinuationToken != null);
    System.out.println("Number of objects in the bucket: " + totalObjects);

    s3Client.close();
}

输出结果:

Number of objects in the bucket: 1060

优势:完全控制分页过程,可灵活处理分页逻辑。

⚠️ 注意:默认每页最多1000个对象,可通过maxKeys()方法调整。

5. 使用ListObjectsV2Iterable分页

AWS SDK提供了更简洁的分页方案——ListObjectsV2IterablelistObjectsV2Paginator()

void listAllObjectsInBucketPaginated(int pageSize) {
    S3Client s3Client = S3Client.builder()
      .region(AWS_REGION)
      .build();

    ListObjectsV2Request listObjectsV2Request = ListObjectsV2Request.builder()
      .bucket(AWS_BUCKET )
      .maxKeys(pageSize) // 控制每页大小
      .build();

    ListObjectsV2Iterable listObjectsV2Iterable = s3Client.listObjectsV2Paginator(listObjectsV2Request);
    long totalObjects = 0;

    for (ListObjectsV2Response page : listObjectsV2Iterable) {
        long retrievedPageSize = page.contents().stream()
          .peek(System.out::println)
          .reduce(0, (subtotal, element) -> subtotal + 1, Integer::sum);
        totalObjects += retrievedPageSize;
        System.out.println("Page size: " + retrievedPageSize);
    }
    System.out.println("Total objects in the bucket: " + totalObjects);

    s3Client.close();
}

设置pageSize=500时的输出:

S3Object(Key=file_0.txt, LastModified=2023-06-06T11:35:06Z, ETag="b9ece18c950afbfa6b0fdbfa4ff731d3", Size=1, StorageClass=STANDARD)
[...]
Page size: 500
S3Object(Key=file_495.txt, LastModified=2023-04-29T18:53:57Z, ETag="83acb6e67e50e31db6ed341dd2de1595", Size=1, StorageClass=STANDARD)
[...]
Page size: 500
S3Object(Key=file_945.txt, LastModified=2023-04-29T18:54:27Z, ETag="55a54008ad1ba589aa210d2629c1df41", Size=1, StorageClass=STANDARD)
[...]
Page size: 60
Total objects in the bucket: 1060

优势:SDK自动处理分页,代码更简洁,按需加载页面。

6. 使用前缀列出对象

有时需要列出特定前缀的对象(如所有以"backup"开头的文件)。示例存储桶结构:

具有共同前缀的S3对象

修改方法添加前缀过滤:

void listAllObjectsInBucketPaginatedWithPrefix(int pageSize, String prefix) {
    S3Client s3Client = S3Client.builder()
      .region(AWS_REGION)
      .build();
    ListObjectsV2Request listObjectsV2Request = ListObjectsV2Request.builder()
      .bucket(AWS_BUCKET)
      .maxKeys(pageSize)
      .prefix(prefix) // 设置前缀
      .build();

    ListObjectsV2Iterable listObjectsV2Iterable = s3Client.listObjectsV2Paginator(listObjectsV2Request);
    long totalObjects = 0;

    for (ListObjectsV2Response page : listObjectsV2Iterable) {
        long retrievedPageSize = page.contents().stream().count();
        totalObjects += retrievedPageSize;
        System.out.println("Page size: " + retrievedPageSize);
    }
    System.out.println("Total objects in the bucket: " + totalObjects);

    s3Client.close();
}

调用示例:

listAllObjectsInBucketPaginatedWithPrefix(10, "backup");

输出结果:

S3Object(Key=backup/, LastModified=2023-04-30T17:47:33Z, ETag="d41d8cd98f00b204e9800998ecf8427e", Size=0, StorageClass=STANDARD)
S3Object(Key=backup/file_0.txt, LastModified=2023-04-30T17:48:13Z, ETag="a87ff679a2f3e71d9181a67b7542122c", Size=1, StorageClass=STANDARD)
[...]
Page size: 7
Total objects in the bucket: 7

⚠️ 注意:前缀匹配包含目录结构。若要排除根目录对象,需添加尾部斜杠:

listAllObjectsInBucketPaginatedWithPrefix(10, "backup/");

此时输出:

S3Object(Key=backup/, LastModified=2023-04-30T17:47:33Z, ETag="d41d8cd98f00b204e9800998ecf8427e", Size=0, StorageClass=STANDARD)
S3Object(Key=backup/file_0.txt, LastModified=2023-04-30T17:48:13Z, ETag="a87ff679a2f3e71d9181a67b7542122c", Size=1, StorageClass=STANDARD)
[...]
Page size: 6
Total objects in the bucket: 6

7. 总结

本文介绍了使用AWS SDK for Java V2列出S3存储桶对象的多种方法:

  1. 基础列表:适用于1000个对象以下
  2. 延续令牌分页:手动控制分页逻辑
  3. 自动分页迭代器:简洁高效的分页方案
  4. 前缀过滤:按目录结构筛选对象

每种方法都有其适用场景,开发者可根据实际需求选择。完整代码示例可在GitHub获取。


原始标题:Listing All AWS S3 Objects in a Bucket Using Java