Response Augmenter

A Response Augmenter allows you to post-process and extend the output generated by the LLM. A typical use case is to add the sources that the LLM used to compute the response.

The Response Augmenter acts as a post-processor that runs after the LLM has generated its response and, when applicable, after the output has been mapped to a structured object.

This page is about response augmentation — the process of modifying or enriching the LLM’s output after generation.

It is not about retrieval-augmented generation (RAG) or, more broadly, prompt augmentation, which concerns enriching the LLM’s input before generation.

Implementing a Response Augmenter

A response augmenter is implemented as a CDI bean that implements the io.quarkiverse.langchain4j.response.AiResponseAugmenter interface. This interface defines one method for imperative (non-streaming) responses and one for streaming responses. Your implementation must override at least one of these methods (optionally both):

package io.quarkiverse.langchain4j.response;

import io.smallrye.mutiny.Multi;

/**
 * A CDI bean willing to manipulate the response of the AI model needs to implement this interface.
 * An AI method that wants to use an augmenter should be annotated with {@link ResponseAugmenter}, indicating the
 * augmenter implementation class name.
 * <p>
 * The default implementation keeps the response unchanged.
 *
 * @param <T> the type of the response
 */
public interface AiResponseAugmenter<T> {

    /**
     * Augment the response.
     *
     * @param response the response to augment
     * @param params   the parameters to use for the augmentation
     * @return the augmented response
     */
    default T augment(T response, ResponseAugmenterParams params) {
        return response;
    }

    /**
     * Augment a streamed response.
     *
     * @param stream the stream to augment
     * @param params the parameters to use for the augmentation
     * @return the augmented stream
     */
    default Multi<T> augment(Multi<T> stream, ResponseAugmenterParams params) {
        return stream;
    }
}

The ResponseAugmenterParams object contains the following information:

  • The user message

  • The chat memory

  • The augmentation result (RAG text segments)

  • The user message template

  • The variables used to compute the user message from the template

The implementation is free to transform the response, append metadata, or enrich it based on external logic. For streamed responses, the augmentation logic runs on the event loop, so avoid blocking operations.

Using a Response Augmenter

Once you have implemented a response augmenter, you can use it in your AI service by annotating the method with @ResponseAugmenter:

@SessionScoped
@RegisterAiService
public interface CustomerSupportAgent {

    @SystemMessage("""
            ...
            """)
    @InputGuardrails(PromptInjectionGuard.class)
    @ToolBox(BookingRepository.class)
    @ResponseAugmenter(SourceAugmenter.class) // <--- here
    String chat(String userMessage);
}

In this example, the SourceAugmenter class is used to augment the response.

Example

Here is an example of a response augmenter that adds the sources used to compute the response:

@ApplicationScoped
public class SourceAugmenter implements AiResponseAugmenter<String> {

    @Inject
    EmbeddingModel embeddingModel;

    record SourceEmbedding(TextSegment textSegment, String file, Embedding embedding) {}

    @Override
    public String augment(String response, ResponseAugmenterParams params) {
        // Only add sources that are similar to the computed response

        var embeddingOfTheResponse = embeddingModel.embed(response).content();

        // You can also:
        // - Ignore segments not similar enough
        // - Remove duplicates
        // - Append the sources to the response


        List<SourceEmbedding> sources = params.augmentationResult()
            .contents().stream().map(c -> {
                var embedding = embeddingModel.embed(c.textSegment().text()).content();
                // Extract the "source" of the content from the metadata:
                return new SourceEmbedding(c.textSegment(),
                    c.textSegment().metadata().getString("file"), embedding);
            }).toList();

        // Ignore segments not similar enough
        Set<SourceEmbedding> filtered = filter(embeddingOfTheResponse, sources);

        // Remove duplicates
        Set<String> names = new LinkedHashSet<>();
        for (var source : filtered) {
            names.add(source.file());
        }

        // Append the sources to the response
        return response + " (Sources: "
                + String.join(", ", names) + ")";
    }

    private Set<SourceEmbedding> filter(Embedding embeddingOfTheResponse, List<SourceEmbedding> contents) {
        Set<SourceEmbedding> filtered = new LinkedHashSet<>();
        for (SourceEmbedding content : contents) {
            double similarity = CosineSimilarity.between(embeddingOfTheResponse, content.embedding());
            if (similarity > 0.85) {
                filtered.add(content);
            }
        }

        return filtered;
    }
}

Summary

  • A ResponseAugmenter enables post-processing of LLM outputs.

  • It can be applied to both regular and streaming responses.

  • Use it to append metadata (e.g., source files), compute confidence scores, or inject context-specific data.

  • It integrates cleanly with structured output mapping and works with augmentation results (RAG).

Going Further