Infinispan Embedding Store for Retrieval Augmented Generation (RAG)

When implementing Retrieval Augmented Generation (RAG), a capable document store is necessary. This guide will explain how to leverage Infinispan Server as the embeddings store.

Leveraging the Infinispan Embeddings Store

To utilize the Infinispan as embedding store, you’ll need to include the following dependency:


This extension relies on the Quarkus Infinispan client. Ensure the default Infinispan client is configured appropriately. For detailed guidance, refer to the Quarkus Infinispan Client Quickstart and the Quarkus Infinispan Client Reference.

The Infinispan document store requires the dimension of the vector to be set. Add the quarkus.langchain4j.infinispan.dimension property to your file and set it to the dimension of the vector. The dimension depends on the embedding model you use. For example, AllMiniLmL6V2QuantizedEmbeddingModel produces vectors of dimension 384. OpenAI’s text-embedding-ada-002 produces vectors of dimension 1536.

Upon installing the extension, you can utilize the Infinispan document store using the following code:

package io.quarkiverse.langchain4j.samples;

import static;

import java.util.List;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

import dev.langchain4j.model.embedding.EmbeddingModel;
import io.quarkiverse.langchain4j.infinispan.InfinispanEmbeddingStore;

public class IngestorExampleWithInfinispan {

     * The embedding store (Infinispan).
     * The bean is provided by the quarkus-langchain4j-infinispan extension.
    InfinispanEmbeddingStore store;

     * The embedding model (how is computed the vector of a document).
     * The bean is provided by the LLM (like openai) extension.
    EmbeddingModel embeddingModel;

    public void ingest(List<Document> documents) {
        EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
                .documentSplitter(recursive(500, 0))
        // Warning - this can take a long time...

Configuration Settings

By default, the extension utilizes the default Infinispan client for storing and indexing the documents. Customize the behavior of the extension by exploring various configuration options:

Configuration property fixed at build time - All other configuration properties are overridable at runtime

Configuration property



The name of the Infinispan client to use. These clients are configured by means of the infinispan-client extension. If unspecified, it will use the default Infinispan client.



The dimension of the embedding vectors. This has to be the same as the dimension of vectors produced by the embedding model that you use. For example, AllMiniLmL6V2QuantizedEmbeddingModel produces vectors of dimension 384. OpenAI’s text-embedding-ada-002 produces vectors of dimension 1536.




Name of the cache that will be used in Infinispan when searching for related embeddings. If this cache doesn’t exist, it will be created.




The maximum distance. The most distance between vectors is how close or far apart two embeddings are.




Under the Hood

The extension will create and register, both in the client and the server, the protobuf schema needed to serialize and store the indexable embeddings in Infinispan. For example, for a dimension 384, the schema will register the following entity:

 * @Indexed
message LangchainItem384 {

    * @Keyword
   optional string id = 1;

    * @Vector(dimension=384, similarity=COSINE)
   repeated float floatVector = 2;

   optional string text = 3;

   repeated string metadataKeys = 4;

   repeated string metadataValues = 5;

Infinispan Cache

The cache in Infinispan must be an indexed cache. If the cache does not exist, the following cache will be created. Note that the cache configuration points to the schema containing the right size of the dimension depending on the use case.

  "embeddings-cache": {
    "distributed-cache": {
      "mode": "SYNC",
      "remote-timeout": "17500",
      "statistics": true,
      "locking": {
        "concurrency-level": "1000",
        "acquire-timeout": "15000",
        "striping": false
      "indexing": {
        "enabled": true,
        "storage": "local-heap",
        "indexed-entities": [
      "state-transfer": {
        "timeout": "60000"