Introduction to S3proxy | Baeldung

1. Overview

Amazon S3 has cemented itself as the most widely used cloud storage backend due to its scalability, durability, and extensive feature set. This is evident by the fact that many other storage backends aim to be compatible with the S3 API, which is the programming interface used to interact with Amazon S3.

However, applications that rely on the S3 API may face challenges when migrating to alternative storage backends that are not fully compatible. This can lead to significant development effort and vendor lock-in.

This is where S3Proxy comes to the rescue. S3Proxy is an open-source library that addresses the above challenge by providing a compatibility layer between the S3 API and various storage backends. It allows us to seamlessly interact with different storage backends using the already familiar S3 API, without the need for extensive modifications.

In this tutorial, we’ll explore how to integrate S3Proxy in a Spring Boot application and configure it to work with Azure Blob Storage and Google Cloud Storage. We’ll also look at how to set up a file system as a storage backend for local development and testing.

2. How S3Proxy Works

Before we dive into the implementation, let’s take a closer look at how S3Proxy works.

S3Proxy sits between the application and the storage backend, acting as a proxy server. When the application sends a request using the S3 API, it intercepts the request and translates it into the corresponding API call for the configured storage backend. Similarly, the response from the storage backend is then translated back into the S3 format and returned to the application.

S3Proxy runs using an embedded Jetty server and handles the translation process using Apache jclouds, a multi-cloud toolkit, to interact with various storage backends.

3. Setting up the Project

Before we can use S3Proxy to access various storage backends, we’ll need to include the necessary SDK dependencies and configure our application correctly.

3.1. Dependencies

Let’s start by adding the necessary dependencies to our project’s pom.xml file:

<dependency>
    <groupId>org.gaul</groupId>
    <artifactId>s3proxy</artifactId>
    <version>2.3.0</version>
</dependency>
<dependency>
    <groupId>software.amazon.awssdk</groupId>
    <artifactId>s3</artifactId>
    <version>2.28.23</version>
</dependency>

The S3Proxy dependency provides us with the proxy server and the necessary Apache jclouds components that we’ll configure later in the tutorial.

Meanwhile, the Amazon S3 dependency provides us with the S3Client class, a java wrapper around the S3 API.

3.2. Defining Cloud-Agnostic Storage Properties

Now, we’ll define a set of cloud-agnostic storage properties that can be used across different storage backends.

We’ll store these properties in our project’s application.yaml file and use @ConfigurationProperties to map the values to a POJO, which we’ll reference when defining our jclouds components and S3Client bean:

@ConfigurationProperties(prefix = "com.baeldung.storage")
class StorageProperties {

    private String identity;

    private String credential;

    private String region;

    private String bucketName;

    private String proxyEndpoint;

    // standard setters and getters

}

The above properties represent the common configuration parameters required by most storage backends, such as the security credentials, region, and bucket name. In addition, we also declare the proxyEndpoint property, which specifies the URL where our embedded S3Proxy server will be running.

Let’s have a look at a snippet of our application.yaml file that defines the required properties that’ll be mapped to our StorageProperties class automatically:

com:
  baeldung:
    storage:
      identity: ${STORAGE_BACKEND_IDENTITY}
      credential: ${STORAGE_BACKEND_CREDENTIAL}
      region: ${STORAGE_BACKEND_REGION}
      bucket-name: ${STORAGE_BACKEND_BUCKET_NAME}
      proxy-endpoint: ${S3PROXY_ENDPOINT}

We use the ${} property placeholder to load the values of our properties from environment variables.

Accordingly, this setup allows us to externalize our backend storage properties and easily access them in our application.

3.3. Initializing S3Proxy at Application Startup

To ensure that the embedded S3Proxy server is up and running when our application starts, we’ll create an S3ProxyInitializer class that implements the ApplicationRunner interface:

@Component
class S3ProxyInitializer implements ApplicationRunner {

    private final S3Proxy s3Proxy;

    // standard constructor

    @Override
    public void run(ApplicationArguments args) {
        s3Proxy.start();
    }

}

Using constructor injection, we inject an instance of S3Proxy and use it to start the embedded proxy server inside the run() method.

It’s important to note that we haven’t yet created a bean of S3Proxy class, which we’ll be doing in the next section.

4. Accessing Azure Blob Storage

Now, to access Azure Blob Storage using S3Proxy, we’ll create a StorageConfiguration class and inject our cloud-agnostic StorageProperties that we created earlier. We’ll define all the necessary beans in this new class.

First, let’s start by creating a BlobStore bean. This bean represents the underlying storage backend that we’ll be interacting with:

@Bean
public BlobStore azureBlobStore() {
    return ContextBuilder
      .newBuilder("azureblob")
      .credentials(storageProperties.getIdentity(), storageProperties.getCredential())
      .build(BlobStoreContext.class)
      .getBlobStore();
}

We use Apache jclouds’ ContextBuilder to create a BlobStoreContext instance configured with the azureblob provider. Then, we obtain the BlobStore instance from this context.

We also pass the security credentials from our injected StorageProperties instance. For Azure Blob Storage, the name of the storage account will be our identity, and its corresponding access key will be our credential.

With our BlobStore configured, let’s define the S3Proxy bean:

@Bean
public S3Proxy s3Proxy(BlobStore blobStore) {
    return S3Proxy
      .builder()
      .blobStore(blobStore)
      .endpoint(URI.create(storageProperties.getProxyEndpoint()))
      .build();
}

We create our S3Proxy bean using the Blobstore instance and the configured proxyEndpoint in our application.yaml file. This bean is responsible for translating the S3 API calls to the underlying storage backend.

Finally, let’s create our S3Client bean:

@Bean
public S3Client s3Client() {
    S3Configuration s3Configuration = S3Configuration
      .builder()
      .checksumValidationEnabled(false)
      .build();
    AwsCredentials credentials = AwsBasicCredentials.create(
        storageProperties.getIdentity(),
        storageProperties.getCredential()
    );
    return S3Client
      .builder()
      .region(Region.of(storageProperties.getRegion()))
      .endpointOverride(URI.create(storageProperties.getProxyEndpoint()))
      .credentialsProvider(StaticCredentialsProvider.create(credentials))
      .serviceConfiguration(s3Configuration)
      .build();
}

We should note that we disable checksum validation in the S3Configuration. This is necessary because Azure returns a non-MD5 ETag, which would cause an error when using the default configuration.

In this tutorial, for simplicity, we’ll be using the same S3Client bean for other backend storages as well. However, if we’re not using Azure Blob Storage, we can remove this configuration.

With these beans in place, our application can now interact with Azure Blob Storage using the familiar S3 API.

5. Accessing GCP Cloud Storage

Now, to access Google Cloud Storage, we’ll only need to make changes to our BlobStore bean.

First, let’s create a new BlobStore bean for Google Cloud Storage. We’ll use Spring profiles to conditionally create either the Azure or GCP BlobStore bean based on the active profile:

@Bean
@Profile("azure")
public BlobStore azureBlobStore() {
    // ... same as above
}

@Bean
@Profile("gcp")
public BlobStore gcpBlobStore() {
    return ContextBuilder
      .newBuilder("google-cloud-storage")
      .credentials(storageProperties.getIdentity(), storageProperties.getCredential())
      .build(BlobStoreContext.class)
      .getBlobStore();
}

Here, we create a BlobStore instance using the google-cloud-storage provider when the gcp profile is active.

For Google Cloud Storage, the identity will be our Service Account’s email-id and the credential will be the corresponding RSA private key.

With this configuration change, our application can now interact with Google Cloud Storage using the S3 API.

6. Local Development and Testing Using File System

6.1. Setting up Our Local Configuration

First, let’s add a new property to our StorageProperties class to specify the base directory for our local file system storage:

private String localFileBaseDirectory;

// standard setters and getters

Next, we’ll create a new LocalStorageConfiguration class. We’ll use @Profile to activate this class for the local and test profiles. In this class, we’ll update our beans as needed to work with the local file system:

@Configuration
@Profile("local | test")
@EnableConfigurationProperties(StorageProperties.class)
public class LocalStorageConfiguration {
    
    private final StorageProperties storageProperties;

    // standard constructor
    
    @Bean
    public BlobStore blobStore() {
        Properties properties = new Properties();
        String fileSystemDir = storageProperties.getLocalFileBaseDirectory();
        properties.setProperty("jclouds.filesystem.basedir", fileSystemDir);
        return ContextBuilder
          .newBuilder("filesystem")
          .overrides(properties)
          .build(BlobStoreContext.class)
          .getBlobStore();
    }

    @Bean
    public S3Proxy s3Proxy(BlobStore blobStore) {
        return S3Proxy
          .builder()
          .awsAuthentication(AuthenticationType.NONE, null, null)
          .blobStore(blobStore)
          .endpoint(URI.create(storageProperties.getProxyEndpoint()))
          .build();
    }

}

Here, we create a BlobStore bean using the filesystem provider and configure our base directory.

Then, we create an S3Proxy bean for our file system BlobStore. Notice that we set the authentication type to NONE since we don’t need any authentication for local file system storage.

Finally, let’s create a simplified S3Client bean that doesn’t require any credentials:

@Bean
public S3Client s3Client() {
    return S3Client
      .builder()
      .region(Region.US_EAST_1)
      .endpointOverride(URI.create(storageProperties.getProxyEndpoint()))
      .build();
}

In the above, we hardcode the US_EAST_1 region, however, the region selection doesn’t really matter for this configuration.

With this setup, our application is now configured to use the local file system as its storage backend. This eliminates the need to connect to a real cloud storage service, which reduces cost and speeds up our development and testing cycles.

6.2. Testing Interactions With S3Client

Now, let’s write a test to verify that we can, in fact, use the S3Client to interact with our local file system storage.

We’ll start by defining the necessary properties in our application-local.yaml file:

com:
  baeldung:
    storage:
      proxy-endpoint: http://127.0.0.1:8080
      bucket-name: baeldungbucket
      local-file-base-directory: tmp-store

Next, let’s set up our test class:

@SpringBootTest
@TestInstance(Lifecycle.PER_CLASS)
@ActiveProfiles({ "local", "test" })
@EnableConfigurationProperties(StorageProperties.class)
class LocalFileSystemStorageIntegrationTest {

    @Autowired
    private S3Client s3Client;

    @Autowired
    private StorageProperties storageProperties;

    @BeforeAll
    void setup() {
        File directory = new File(storageProperties.getLocalFileBaseDirectory());
        directory.mkdir();

        String bucketName = storageProperties.getBucketName();
        try {
            s3Client.createBucket(request -> request.bucket(bucketName));
        } catch (BucketAlreadyOwnedByYouException exception) {
            // do nothing
        }
    }
    
    @AfterAll
    void teardown() {
        File directory = new File(storageProperties.getLocalFileBaseDirectory());
        FileUtils.forceDelete(directory);
    }

}

In our setup() method annotated with @BeforeAll, we create the base directory and the bucket if they don’t exist. And, in our teardown() method, we delete the base directory to clean up after our tests.

Finally, let’s write a test to verify that we can upload a file using the S3Client class:

@Test
void whenFileUploaded_thenFileSavedInFileSystem() {
    // Prepare test file to upload
    String key = RandomString.make(10) + ".txt";
    String fileContent = RandomString.make(50);
    MultipartFile fileToUpload = createTextFile(key, fileContent);
    
    // Save file to file system
    s3Client.putObject(request -> 
        request
          .bucket(storageProperties.getBucketName())
          .key(key)
          .contentType(fileToUpload.getContentType()),
        RequestBody.fromBytes(fileToUpload.getBytes()));
    
    // Verify that the file is saved successfully by checking if it exists in the file system
    List<S3Object> savedObjects = s3Client.listObjects(request -> 
        request.bucket(storageProperties.getBucketName())
    ).contents();
    assertThat(savedObjects)
      .anyMatch(savedObject -> savedObject.key().equals(key));
}

private MultipartFile createTextFile(String fileName, String content) {
    byte[] fileContentBytes = content.getBytes();
    InputStream inputStream = new ByteArrayInputStream(fileContentBytes);
    return new MockMultipartFile(fileName, fileName, "text/plain", inputStream);
}

In our test method, we first prepare a MultipartFile with a random name and content. We then use the S3Client to upload this file to our test bucket.

Finally, we verify that the file was saved successfully by listing all objects in the bucket and asserting that the file with the random key is present.

7. Conclusion

In this article, we’ve explored integrating S3Proxy in our Spring Boot application.

We walked through the necessary configurations and set up cloud-agnostic storage properties to use across different storage backends.

Then, we looked at how we can access Azure Blob Storage and GCP Cloud Storage using the Amazon S3 API.

Finally, we set up an environment using a file system for local development and testing.

As always, all the code examples used in this article are available over on GitHub.

Persistence

REST

Security