Infinispan Embedding Store

Quarkus LangChain4j integrates with Infinispan Server to provide a scalable, distributed vector store for Retrieval-Augmented Generation (RAG). This extension enables you to persist and query embedding vectors for document retrieval.

Prerequisites

To use Infinispan as a vector-capable embedding store:

An Infinispan Server must be running and accessible
The Quarkus Infinispan client must be configured
Vector embeddings must have a fixed dimension that matches your embedding model

This extension requires Infinispan Server with Protobuf indexing enabled. It automatically registers the required schema on startup.

Dependency

To enable Infinispan support in your Quarkus project, add the following dependency:

<dependency>
    <groupId>io.quarkiverse.langchain4j</groupId>
    <artifactId>quarkus-langchain4j-infinispan</artifactId>
    <version>1.0.2</version>
</dependency>

This extension builds upon the Quarkus Infinispan client. Ensure that the default Infinispan client is correctly configured* For more details, see:

Embedding Dimension

You must configure the dimension of the embedding vectors to match your embedding model:

quarkus.langchain4j.infinispan.dimension=384

Common model dimensions:

AllMiniLmL6V2QuantizedEmbeddingModel → 384
OpenAI text-embedding-ada-002 → 1536

If the embedding dimension is missing or mismatched, ingestion and retrieval will fail or produce inaccurate results.

If you switch to a different embedding model, ensure the dimension value is updated accordingly.

Usage Example

Once installed and configured, you can use the Infinispan embedding store as follows:

package io.quarkiverse.langchain4j.samples;

import static dev.langchain4j.data.document.splitter.DocumentSplitters.recursive;

import java.util.List;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

import dev.langchain4j.data.document.Document;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import io.quarkiverse.langchain4j.infinispan.InfinispanEmbeddingStore;

@ApplicationScoped
public class IngestorExampleWithInfinispan {

    /**
     * The embedding store (Infinispan).
     * The bean is provided by the quarkus-langchain4j-infinispan extension.
     */
    @Inject
    InfinispanEmbeddingStore store;

    /**
     * The embedding model (how is computed the vector of a document).
     * The bean is provided by the LLM (like openai) extension.
     */
    @Inject
    EmbeddingModel embeddingModel;

    public void ingest(List<Document> documents) {
        EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
                .embeddingStore(store)
                .embeddingModel(embeddingModel)
                .documentSplitter(recursive(500, 0))
                .build();
        // Warning - this can take a long time...
        ingestor.ingest(documents);
    }
}

This demonstrates how to store and retrieve embedded documents using Infinispan as the backend.

Configuration

By default, the extension uses the default Infinispan client and cache. You can customize its behavior via the following configuration options:

Configuration property fixed at build time - All other configuration properties are overridable at runtime

Configuration property	Type	Default
`quarkus.langchain4j.infinispan.client-name` The name of the Infinispan client to use. These clients are configured by means of the `infinispan-client` extension. If unspecified, it will use the default Infinispan client. Environment variable: `QUARKUS_LANGCHAIN4J_INFINISPAN_CLIENT_NAME`	string
`quarkus.langchain4j.infinispan.dimension` The dimension of the embedding vectors. This has to be the same as the dimension of vectors produced by the embedding model that you use. For example, AllMiniLmL6V2QuantizedEmbeddingModel produces vectors of dimension 384. OpenAI’s text-embedding-ada-002 produces vectors of dimension 1536. Environment variable: `QUARKUS_LANGCHAIN4J_INFINISPAN_DIMENSION`	long	required
`quarkus.langchain4j.infinispan.cache-name` Name of the cache that will be used in Infinispan when searching for related embeddings. If this cache doesn’t exist, it will be created. Environment variable: `QUARKUS_LANGCHAIN4J_INFINISPAN_CACHE_NAME`	string	`embeddings-cache`
`quarkus.langchain4j.infinispan.distance` The maximum distance. The most distance between vectors is how close or far apart two embeddings are. Environment variable: `QUARKUS_LANGCHAIN4J_INFINISPAN_DISTANCE`	int	`3`

Configuration property

Type

Default

quarkus.langchain4j.infinispan.client-name

The name of the Infinispan client to use. These clients are configured by means of the infinispan-client extension. If unspecified, it will use the default Infinispan client.

Environment variable: QUARKUS_LANGCHAIN4J_INFINISPAN_CLIENT_NAME

string

quarkus.langchain4j.infinispan.dimension

The dimension of the embedding vectors. This has to be the same as the dimension of vectors produced by the embedding model that you use. For example, AllMiniLmL6V2QuantizedEmbeddingModel produces vectors of dimension 384. OpenAI’s text-embedding-ada-002 produces vectors of dimension 1536.

Environment variable: QUARKUS_LANGCHAIN4J_INFINISPAN_DIMENSION

long

required

quarkus.langchain4j.infinispan.cache-name

Name of the cache that will be used in Infinispan when searching for related embeddings. If this cache doesn’t exist, it will be created.

Environment variable: QUARKUS_LANGCHAIN4J_INFINISPAN_CACHE_NAME

string

embeddings-cache

quarkus.langchain4j.infinispan.distance

The maximum distance. The most distance between vectors is how close or far apart two embeddings are.

Environment variable: QUARKUS_LANGCHAIN4J_INFINISPAN_DISTANCE

int

3

How It Works

The Infinispan extension registers a Protobuf schema to define an indexable entity with a vector field. For example, for a dimension of 384, the following schema is generated and registered:

/**
 * @Indexed
 */
message LangchainItem384 {

   /**
    * @Keyword
    */
   optional string id = 1;

   /**
    * @Vector(dimension=384, similarity=COSINE)
    */
   repeated float floatVector = 2;

   optional string text = 3;

   repeated string metadataKeys = 4;

   repeated string metadataValues = 5;
}

The embedding vector is stored as a repeated float and indexed for similarity search.

Infinispan Cache Configuration

The extension will create an indexed cache if one is not already defined. Below is the default configuration that may be used or customized:

{
  "embeddings-cache": {
    "distributed-cache": {
      "mode": "SYNC",
      "remote-timeout": "17500",
      "statistics": true,
      "locking": {
        "concurrency-level": "1000",
        "acquire-timeout": "15000",
        "striping": false
      },
      "indexing": {
        "enabled": true,
        "storage": "local-heap",
        "indexed-entities": [
          "LangchainItem384"
        ]
      },
      "state-transfer": {
        "timeout": "60000"
      }
    }
  }
}

The name of the indexed entity (LangchainItem384) changes depending on the configured embedding dimension.

Summary

To use Infinispan as a distributed vector store for RAG with Quarkus LangChain4j:

Ensure Infinispan Server is running with indexing enabled
Add the required extension dependency
Set the embedding vector dimension
Configure or allow the extension to create an indexed cache
Use InfinispanEmbeddingStore to ingest and retrieve documents for similarity search