1. Overview

Artificial Intelligence is changing the way we build web applications. One exciting application of AI is generating an image from a text-based description. OpenAI’s DALL·E 3 is a popular text-to-image model that helps us achieve this.

In this tutorial, we’ll explore how to generate images with OpenAI’s DALL·E 3 model using Spring AI.

To follow this tutorial, we’ll need an OpenAI API key.

2. Setting up the Project

Before we can start generating AI images, we’ll need to include a Spring Boot starter dependency and configure our application correctly.

2.1. Dependencies

Let’s start by adding the spring-ai-openai-spring-boot-starter dependency to our project’s pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
    <version>1.0.0-M3</version>
</dependency>
<repositories>
    <repository>
        <id>spring-milestones</id>
        <name>Spring Milestones</name>
        <url>https://repo.spring.io/milestone</url>
        <snapshots>
            <enabled>false</enabled>
        </snapshots>
    </repository>
</repositories>

This repository is where milestone versions are published, as opposed to the standard Maven Central repository.

The above starter dependency provides us with the necessary classes to interact with the OpenAI service from our application and generate AI images using its DALL·E 3 model.

2.2. Configuring an OpenAI API Key

Now, to interact with the OpenAI service, we need to configure our API key in our application.yaml file:

spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}

We use the ${} property placeholder to load the value of our property from an environment variable.

On configuring a valid API key, Spring AI will automatically create an ImageModel bean for us. We’ll autowire it in our service layer and send requests to generate AI images.

2.3. Configuring Default Image Options

Next, let’s also configure a few default image options that’ll be used to generate our images:

spring:
  ai:
    openai:
      image:
        options:
          model: dall-e-3
          size: 1024x1024
          style: vivid
          quality: standard
          response-format: url

We first configure dall-e-3 as the model to use for image generation.

To get a perfect square image, we specify 1024×1024 as the size. The other 2 allowed size options are 1792×1024 and 1024×1792.

Next, we set the style to be vivid — that tells the AI model to generate hyper-realistic and dramatic images. The other available option is natural, which we can use to generate more natural and less hyper-realistic images.

For the quality option, we set it to standard, which will work for most use cases. However, if we need images with enhanced details and better consistency, we can set the value to hd. It should be noted that images of hd quality will take more time to generate.

Finally, we set the response-format option to url. The generated image will be accessible via a URL of 60-minute validity. Alternatively, we can set its value to b64_json to receive the image as a Base64-encoded string.

We’ll look at how to override these default image options later in the tutorial.

3. Generating AI Images With DALL·E 3

Now that we’ve set up our application, let’s create an ImageGenerator class. We’ll autowire the ImageModel bean and reference it to generate AI images:

public String generate(String prompt) {
    ImagePrompt imagePrompt = new ImagePrompt(prompt);
    ImageResponse imageResponse = imageModel.call(imagePrompt);
    return resolveImageContent(imageResponse);
}

private String resolveImageContent(ImageResponse imageResponse) {
    Image image = imageResponse.getResult().getOutput();
    return Optional
      .ofNullable(image.getUrl())
      .orElseGet(image::getB64Json);
}

Here, we create a generate() method which takes a prompt String, representing the text description of the image we want to generate.

Next, we create an ImagePrompt object with our prompt parameter. Then, we pass it to the call() method of our ImageModel bean to send the image generation request.

The imageResponse will contain either a URL or a Base64-encoded String representation of the image, depending on the response-format option we configured earlier in our application.yaml file. To resolve the correct output property, we create a resolveImageContent() helper method and return it as a response.

4. Overriding Default Image Options

In some cases, we may want to override the default image options we configured in our application.yaml file.

Let’s take a look at how we can do this by overloading our generate() method:

public String generate(ImageGenerationRequest request) {
    ImageOptions imageOptions = OpenAiImageOptions
      .builder()
      .withUser(request.username())
      .withHeight(request.height())
      .withWidth(request.width())
      .build();
    ImagePrompt imagePrompt = new ImagePrompt(request.prompt(), imageOptions);
    
    ImageResponse imageResponse = imageModel.call(imagePrompt);
    return resolveImageContent(imageResponse);
}

record ImageGenerationRequest(
    String prompt,
    String username,
    Integer height,
    Integer width
) {}

We first create an ImageGenerationRequest record, which, in addition to holding our prompt, contains the username and the desired image height and width.

We use these additional values to create an ImageOptions instance and pass it to the ImagePrompt constructor. It’s important to note that the OpenAiImageOptions class doesn’t have a size property, hence we provide the height and width values separately.

The user option helps us link the image generation request with a specific end user. This is recommended as a security best practice to prevent abuse.

As per requirement, we can also override other image options like style, quality, and response-format in our ImageOptions object.

5. Putting Our ImageGenerator Class to the Test

Now that we’ve implemented our ImageGenerator class, let’s test it out:

String prompt = """
    A cartoon depicting a gangster donkey wearing 
    sunglasses and eating grapes in a city street.
""";
String response = imageGenerator.generate(prompt);

Here, we pass our prompt to our ImageGenerator’s generate() method. After a short processing time, we’ll receive a response containing the URL or Base64-encoded string of our generated image, depending on the configured response-format property.

Let’s take a look at what DALL·E 3 generated for us:

A cartoon depicting a gangster donkey wearing sunglasses and eating grapes in a city street.

As we can see, the generated image accurately matches our prompt. This demonstrates the power of DALL·E 3 in understanding natural language descriptions and turning them into an image.

6. Conclusion

In this article, we explored how to generate AI images from textual description using Spring AI. We used OpenAI’s DALL·E 3 model under the hood.

We walked through the necessary configurations and developed a service class to generate AI images. Additionally, we looked at the default image options and how to override them dynamically.

By integrating DALL·E 3 into our Java applications via Spring AI, we can easily add image generation capabilities without the overhead of training and hosting our own models.

As always, all the code examples used in this article are available over on GitHub.