1. 概述
本文将重点介绍如何使用Java列出S3存储桶中的所有对象。我们将讨论使用AWS SDK for Java与S3交互的方法,并针对不同场景提供示例代码。
本文将主要使用AWS SDK for Java V2,该版本相较于前代有显著改进,包括:
- ✅ 性能增强
- ✅ 非阻塞I/O
- ✅ 更友好的API设计
2. 前置条件
要列出S3存储桶中的对象,我们需要使用AWS SDK提供的S3Client
类。首先创建Java项目并添加Maven依赖:
<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>s3</artifactId>
<version>2.24.9</version>
</dependency>
⚠️ 本文示例使用2.20.52
版本,最新版本可在Maven仓库查看。
还需要完成以下配置:
- 设置AWS账户
- 安装AWS CLI
- 配置AWS凭证(
AWS_ACCESS_KEY_ID
和AWS_SECRET_ACCESS_KEY
) - 创建S3存储桶并上传文件
示例中使用的存储桶名为baeldung-tutorials-s3
,包含1060个文件:
3. 列出S3存储桶中的对象
使用AWS SDK V2创建读取存储桶对象的方法:
String AWS_BUCKET = "baeldung-tutorial-s3";
Region AWS_REGION = Region.EU_CENTRAL_1;
void listObjectsInBucket() {
S3Client s3Client = S3Client.builder()
.region(AWS_REGION)
.build();
ListObjectsV2Request listObjectsV2Request = ListObjectsV2Request.builder()
.bucket(AWS_BUCKET)
.build();
ListObjectsV2Response listObjectsV2Response = s3Client.listObjectsV2(listObjectsV2Request);
List<S3Object> contents = listObjectsV2Response.contents();
System.out.println("Number of objects in the bucket: " + contents.stream().count());
contents.stream().forEach(System.out::println);
s3Client.close();
}
执行结果:
Number of objects in the bucket: 1000
S3Object(Key=file_0.txt, LastModified=2023-06-06T11:35:06Z, ETag="b9ece18c950afbfa6b0fdbfa4ff731d3", Size=1, StorageClass=STANDARD)
S3Object(Key=file_1.txt, LastModified=2023-06-06T11:35:07Z, ETag="97a6dd4c45b23db9c5d603ce161b8cab", Size=1, StorageClass=STANDARD)
[...]
❌ 踩坑提醒:此方法最多返回1000个对象!如果存储桶包含超过1000个对象,必须使用分页处理。
4. 使用延续令牌分页
当存储桶对象超过1000个时,需要使用nextContinuationToken()
实现分页:
void listAllObjectsInBucket() {
S3Client s3Client = S3Client.builder()
.region(AWS_REGION)
.build();
String nextContinuationToken = null;
long totalObjects = 0;
do {
ListObjectsV2Request.Builder requestBuilder = ListObjectsV2Request.builder()
.bucket(AWS_BUCKET)
.continuationToken(nextContinuationToken);
ListObjectsV2Response response = s3Client.listObjectsV2(requestBuilder.build());
nextContinuationToken = response.nextContinuationToken();
totalObjects += response.contents().stream()
.peek(System.out::println)
.reduce(0, (subtotal, element) -> subtotal + 1, Integer::sum);
} while (nextContinuationToken != null);
System.out.println("Number of objects in the bucket: " + totalObjects);
s3Client.close();
}
输出结果:
Number of objects in the bucket: 1060
✅ 优势:完全控制分页过程,可灵活处理分页逻辑。
⚠️ 注意:默认每页最多1000个对象,可通过maxKeys()
方法调整。
5. 使用ListObjectsV2Iterable分页
AWS SDK提供了更简洁的分页方案——ListObjectsV2Iterable
和listObjectsV2Paginator()
:
void listAllObjectsInBucketPaginated(int pageSize) {
S3Client s3Client = S3Client.builder()
.region(AWS_REGION)
.build();
ListObjectsV2Request listObjectsV2Request = ListObjectsV2Request.builder()
.bucket(AWS_BUCKET )
.maxKeys(pageSize) // 控制每页大小
.build();
ListObjectsV2Iterable listObjectsV2Iterable = s3Client.listObjectsV2Paginator(listObjectsV2Request);
long totalObjects = 0;
for (ListObjectsV2Response page : listObjectsV2Iterable) {
long retrievedPageSize = page.contents().stream()
.peek(System.out::println)
.reduce(0, (subtotal, element) -> subtotal + 1, Integer::sum);
totalObjects += retrievedPageSize;
System.out.println("Page size: " + retrievedPageSize);
}
System.out.println("Total objects in the bucket: " + totalObjects);
s3Client.close();
}
设置pageSize=500
时的输出:
S3Object(Key=file_0.txt, LastModified=2023-06-06T11:35:06Z, ETag="b9ece18c950afbfa6b0fdbfa4ff731d3", Size=1, StorageClass=STANDARD)
[...]
Page size: 500
S3Object(Key=file_495.txt, LastModified=2023-04-29T18:53:57Z, ETag="83acb6e67e50e31db6ed341dd2de1595", Size=1, StorageClass=STANDARD)
[...]
Page size: 500
S3Object(Key=file_945.txt, LastModified=2023-04-29T18:54:27Z, ETag="55a54008ad1ba589aa210d2629c1df41", Size=1, StorageClass=STANDARD)
[...]
Page size: 60
Total objects in the bucket: 1060
✅ 优势:SDK自动处理分页,代码更简洁,按需加载页面。
6. 使用前缀列出对象
有时需要列出特定前缀的对象(如所有以"backup"开头的文件)。示例存储桶结构:
修改方法添加前缀过滤:
void listAllObjectsInBucketPaginatedWithPrefix(int pageSize, String prefix) {
S3Client s3Client = S3Client.builder()
.region(AWS_REGION)
.build();
ListObjectsV2Request listObjectsV2Request = ListObjectsV2Request.builder()
.bucket(AWS_BUCKET)
.maxKeys(pageSize)
.prefix(prefix) // 设置前缀
.build();
ListObjectsV2Iterable listObjectsV2Iterable = s3Client.listObjectsV2Paginator(listObjectsV2Request);
long totalObjects = 0;
for (ListObjectsV2Response page : listObjectsV2Iterable) {
long retrievedPageSize = page.contents().stream().count();
totalObjects += retrievedPageSize;
System.out.println("Page size: " + retrievedPageSize);
}
System.out.println("Total objects in the bucket: " + totalObjects);
s3Client.close();
}
调用示例:
listAllObjectsInBucketPaginatedWithPrefix(10, "backup");
输出结果:
S3Object(Key=backup/, LastModified=2023-04-30T17:47:33Z, ETag="d41d8cd98f00b204e9800998ecf8427e", Size=0, StorageClass=STANDARD)
S3Object(Key=backup/file_0.txt, LastModified=2023-04-30T17:48:13Z, ETag="a87ff679a2f3e71d9181a67b7542122c", Size=1, StorageClass=STANDARD)
[...]
Page size: 7
Total objects in the bucket: 7
⚠️ 注意:前缀匹配包含目录结构。若要排除根目录对象,需添加尾部斜杠:
listAllObjectsInBucketPaginatedWithPrefix(10, "backup/");
此时输出:
S3Object(Key=backup/, LastModified=2023-04-30T17:47:33Z, ETag="d41d8cd98f00b204e9800998ecf8427e", Size=0, StorageClass=STANDARD)
S3Object(Key=backup/file_0.txt, LastModified=2023-04-30T17:48:13Z, ETag="a87ff679a2f3e71d9181a67b7542122c", Size=1, StorageClass=STANDARD)
[...]
Page size: 6
Total objects in the bucket: 6
7. 总结
本文介绍了使用AWS SDK for Java V2列出S3存储桶对象的多种方法:
- 基础列表:适用于1000个对象以下
- 延续令牌分页:手动控制分页逻辑
- 自动分页迭代器:简洁高效的分页方案
- 前缀过滤:按目录结构筛选对象
每种方法都有其适用场景,开发者可根据实际需求选择。完整代码示例可在GitHub获取。