A Guide to Structured Output in Spring AI | Baeldung

1. Introduction

Typically, when using large language models (LLMs), we don’t expect a structured response. Moreover, we got used to their unpredictable behavior, which often leads to outputs that do not always meet our expectations. However, there are methods to increase the likelihood of generating structured responses (though not with 100% probability) and even parsing these responses into usable code structures.

In this tutorial, we’ll explore Spring AI and tools that simplify and streamline this process, making it more accessible and straightforward.

2. Brief Introduction To the Chat Model

The basic structure that allows us to do prompts to the AI models is the ChatModel interface:

public interface ChatModel extends Model<Prompt, ChatResponse> {
    default String call(String message) {
        // implementation is skipped
    }

    @Override
    ChatResponse call(Prompt prompt);
}

The call() method functions as a mechanism for sending a message to the model and receiving a response, nothing more. It is natural to expect the prompt and response to be a String type. However, modern model implementations often feature more complex structures that enable finer tuning, enhancing the model’s predictability. For example, while the default call() method accepting a String parameter is available, it is more practical to utilize a Prompt. This Prompt can have multiple messages or include options like temperature to regulate the model’s apparent creativity.

We can autowire ChatModel and call it directly. For example, if we have spring-ai-openai-spring-boot-starter for OpenAI API in our dependencies, OpenAiChatModel implementation will be autowired.

3. Structured Output API

To get an output in the form of a data structure, *Spring AI provides tools to wrap ChatModel‘s call using the Structured Output API*. The core interface for this API is StructuredOutputConverter:

public interface StructuredOutputConverter<T> extends Converter<String, T>, FormatProvider {}

It combines two other interfaces, first one is FormatProvider:

public interface FormatProvider {
    String getFormat();
}

Before the ChatModel’s call(), getFormat() prepares the prompt, populates it with the required data schema, and specifically describes how the data should be formatted to avoid inconsistencies in response. For example, to get a response in JSON format, it uses this prompt:

public String getFormat() {
    String template = "Your response should be in JSON format.\n"
      + "Do not include any explanations, only provide a RFC8259 compliant JSON response following this format without deviation.\n"
      + "Do not include markdown code blocks in your response.\n
      + "Remove the ```json markdown from the output.\nHere is the JSON Schema instance your output must adhere to:\n```%s```\n";
    return String.format(template, this.jsonSchema);
}

These instructions are usually appended after the user’s input.

The second interface is Converter:

@FunctionalInterface
public interface Converter<S, T> {
    @Nullable
    T convert(S source);
 
    // default method
}

After call() returns the response, the converter parses it into the required data structure of type T. Here is a simple diagram of how StructuredOutputConverter works:

4. Available Converters

In this section, we’ll explore the available implementations of the StructuredOutputConverter with examples. We’ll demonstrate this by generating characters for a Dungeons & Dragons game:

Please note that since Jackson’s ObjectMapper is used behind the scenes, we need empty constructors for our beans.

5. BeanOutputConverter for Beans

The BeanOutputConverter produces an instance of the specified class from the model’s response. It constructs a prompt to instruct the model on generating an RFC8259-compliant JSON. Let’s look at how to use it using ChatClient API:

@Override
public Character generateCharacterChatClient(String race) {
    return ChatClient.create(chatModel).prompt()
      .user(spec -> spec.text("Generate a D&D character with race {race}")
        .param("race", race))
        .call()
        .entity(Character.class); // <-------- we call ChatModel.call() here, not on the line before
}

In this method, ChatClient.create(chatModel) instantiates a ChatClient. The prompt() method initiates the builder chain with the request (ChatClientRequest). In our case, we only add the user’s text. Once the request is created, the call() method is invoked, returning a new CallResponseSpec with ChatModel and ChatClientRequest inside. The entity() method then creates a converter based on the provided type, completes the prompt, and invokes the AI model.

We may notice that we didn’t use BeanOutputConverter directly. That’s because we used a class as the parameter for the .entity() method, it means the BeanOutputConverter will handle the prompt and conversion.

For more control, we can write a low-level version of this approach. Here, we will use ChatModel.call() by ourselves, which we autowired beforehand:

@Override
public Character generateCharacterChatModel(String race) {
    BeanOutputConverter<Character> beanOutputConverter = new BeanOutputConverter<>(Character.class);

    String format = beanOutputConverter.getFormat();

    String template = """
                Generate a D&D character with race {race}
                {format}
                """;

    PromptTemplate promptTemplate = new PromptTemplate(template, Map.of("race", race, "format", format));
    Prompt prompt = new Prompt(promptTemplate.createMessage());
    Generation generation = chatModel.call(prompt).getResult();

    return beanOutputConverter.convert(generation.getOutput().getContent());
}

In the example above, we created BeanOutputConverter, extracted formatting guidelines for the model, and then added these guidelines to the custom prompt. We produced the final prompt by using PromptTemplate. PromptTemplate is a core prompt templating component for Spring AI and it uses StringTemplate engine under the hood. Then, we call the model to get Generation as a result. Generation represents the model’s response: we extract its content and then convert it to the Java object using the converter.

Here is the real response example we get from the OpenAI using our converter:

{
    name: "Thoren Ironbeard",
    age: 150,
    race: "Dwarf",
    characterClass: "Wizard",
    cityOfOrigin: "Sundabar",
    favoriteWeapon: "Magic Staff",
    bio: "Born and raised in the city of Sundabar, he is known for his skills in crafting and magic."
}

Dwarven wizard, what a rare sight!

6. Mapoutputconverter and Listoutputconverter for Collections

MapOutputConverter and ListOutputConverter allow us to create responses structured as maps and lists, respectively. Here are high-level and low-level code examples with MapOutputConverter:

@Override
public Map<String, Object> generateMapOfCharactersChatClient(int amount) {
    return ChatClient.create(chatModel).prompt()
      .user(u -> u.text("Generate {amount} D&D characters, where key is a character's name")
        .param("amount", String.valueOf(amount)))
        .call()
        .entity(new ParameterizedTypeReference<Map<String, Object>>() {});
}
    
@Override
public Map<String, Object> generateMapOfCharactersChatModel(int amount) {
    MapOutputConverter outputConverter = new MapOutputConverter();
    String format = outputConverter.getFormat();
    String template = """
            "Generate {amount} of key-value pairs, where key is a "Dungeons and Dragons" character name and value (String) is his bio.
            {format}
            """;
    Prompt prompt = new Prompt(new PromptTemplate(template, Map.of("amount", String.valueOf(amount), "format", format)).createMessage());
    Generation generation = chatModel.call(prompt).getResult();

    return outputConverter.convert(generation.getOutput().getContent());
}

The reason why we used Object in Map<String, Object> is because for now, MapOutputConverter doesn’t support generic values. But worry not, later we will build our custom converter to support that. For now, let’s check the examples for the ListOutputConverter, where we are free to use generics:

@Override
public List<String> generateListOfCharacterNamesChatClient(int amount) {
    return ChatClient.create(chatModel).prompt()
      .user(u -> u.text("List {amount} D&D character names")
        .param("amount", String.valueOf(amount)))
        .call()
        .entity(new ListOutputConverter(new DefaultConversionService()));
}

@Override
public List<String> generateListOfCharacterNamesChatModel(int amount) {
    ListOutputConverter listOutputConverter = new ListOutputConverter(new DefaultConversionService());
    String format = listOutputConverter.getFormat();
    String userInputTemplate = """
            List {amount} D&D character names
            {format}
            """;
    PromptTemplate promptTemplate = new PromptTemplate(userInputTemplate,
      Map.of("amount", amount, "format", format));
    Prompt prompt = new Prompt(promptTemplate.createMessage());
    Generation generation = chatModel.call(prompt).getResult();
    return listOutputConverter.convert(generation.getOutput().getContent());
}

7. Anatomy of the Converter or How To Build Our Own

Let’s create a converter that converts data from the AI model into Map<String, V> format, where V is a generic type. Like converters provided by Spring, our container will implement StructuredOutputConverter, which will require us to add methods convert() and getFormat():

As we know, getFormat() provides an instruction for the AI model, it will follow a user’s prompt in the final request to the AI Model. This instruction specifies a map structure and provides our custom object’s schema for the values. We generated a schema using com.github.victools.jsonschema library. Spring AI already uses this library internally for its converters, which means we don’t need to import it explicitly.

Since we request a response in JSON format, in convert(), we use Jackson’s ObjectMapper for the parsing. Because of this, we trim the markdown like in Spring’s implementation for the BeanOutputConverter. AI models often use markdown to wrap the code snippets, by removing it we avoid exceptions from the ObjectMapper.

After that, we can use our implementation like this:

@Override
public Map<String, Character> generateMapOfCharactersCustomConverter(int amount) {
    GenericMapOutputConverter<Character> outputConverter = new GenericMapOutputConverter<>(Character.class);
    String format = outputConverter.getFormat();
    String template = """
            "Generate {amount} of key-value pairs, where key is a "Dungeons and Dragons" character name and value is character object.
            {format}
            """;
    Prompt prompt = new Prompt(new PromptTemplate(template, Map.of("amount", String.valueOf(amount), "format", format)).createMessage());
    Generation generation = chatModel.call(prompt).getResult();

    return outputConverter.convert(generation.getOutput().getContent());
}

@Override
public Map<String, Character> generateMapOfCharactersCustomConverterChatClient(int amount) {
    return ChatClient.create(chatModel).prompt()
      .user(u -> u.text("Generate {amount} D&D characters, where key is a character's name")
        .param("amount", String.valueOf(amount)))
        .call()
        .entity(new GenericMapOutputConverter<>(Character.class));
}

8. Conclusion

In this article, we explored how to work with large language models (LLMs) to generate structured responses. By leveraging StructuredOutputConverter, we can efficiently convert the model’s output into usable data structures. After that, we discussed the use cases of BeanOutputConverter, MapOutputConverter, and ListOutputConverter, providing practical examples for each. Additionally, we delved into creating a custom converter to handle more complex data types. With these tools, integrating AI-driven structured outputs into Java applications becomes more accessible and manageable, enhancing the reliability and predictability of LLM responses.

As always, the examples are available over on GitHub.

Persistence

REST

Security