Infinispan Embedding Store

Quarkus LangChain4j integrates with Infinispan Server to provide a scalable, distributed vector store for Retrieval-Augmented Generation (RAG). This extension enables you to persist and query embedding vectors for document retrieval.

This extension uses Infinispan 16.0 capabilities, including embedded metadata objects with typed fields for metadata filtering.

Prerequisites

To use Infinispan as a vector-capable embedding store:

  • An Infinispan Server 16.0+ must be running and accessible

  • The Quarkus Infinispan client must be configured

  • Vector embeddings must have a fixed dimension that matches your embedding model

This extension requires Infinispan Server with Protobuf indexing enabled. It automatically registers the required schema on startup.

Dependency

To enable Infinispan support in your Quarkus project, add the following dependency:

<dependency>
  <groupId>io.quarkiverse.langchain4j</groupId>
  <artifactId>quarkus-langchain4j-infinispan</artifactId>
  <version>1.8.4</version>
</dependency>

Even better, if you use the Quarkus platformn BOM (default for projects generated), add the Quarkus Langchain4J BOM and all dependency versions will align:

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>${quarkus.platform.group-id}</groupId>
                <artifactId>${quarkus.platform.artifact-id}</artifactId>
                <version>${quarkus.platform.version}</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
            <dependency>
                <groupId>${quarkus.platform.group-id}</groupId>
                <artifactId>quarkus-langchain4j-bom</artifactId> (1)
                <version>${quarkus.platform.version}</version> (2)
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>

    <dependencies>
      <dependency>
        <groupId>io.quarkiverse.langchain4j</groupId>
        <artifactId>quarkus-langchain4j-infinispan</artifactId>
        (3)
      </dependency>
    </dependencies>
1 In your dependencyManagement section, add the quarkus-langchain4j-bom
2 Inherit the version from your platform version
3 Voilà, no need for version alignment anymore

This extension builds upon the Quarkus Infinispan client. Ensure that the default Infinispan client is correctly configured* For more details, see:

Embedding Dimension

You must configure the dimension of the embedding vectors to match your embedding model:

quarkus.langchain4j.infinispan.dimension=384

Common model dimensions:

  • AllMiniLmL6V2QuantizedEmbeddingModel → 384

  • OpenAI text-embedding-ada-002 → 1536

If the embedding dimension is missing or mismatched, ingestion and retrieval will fail or produce inaccurate results.

If you switch to a different embedding model, ensure the dimension value is updated accordingly.

Usage Example

Once installed and configured, you can use the Infinispan embedding store as follows:

package io.quarkiverse.langchain4j.samples;

import static dev.langchain4j.data.document.splitter.DocumentSplitters.recursive;

import java.util.List;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

import dev.langchain4j.data.document.Document;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import io.quarkiverse.langchain4j.infinispan.InfinispanEmbeddingStore;

@ApplicationScoped
public class IngestorExampleWithInfinispan {

    /**
     * The embedding store (Infinispan).
     * The bean is provided by the quarkus-langchain4j-infinispan extension.
     */
    @Inject
    InfinispanEmbeddingStore store;

    /**
     * The embedding model (how is computed the vector of a document).
     * The bean is provided by the LLM (like openai) extension.
     */
    @Inject
    EmbeddingModel embeddingModel;

    public void ingest(List<Document> documents) {
        EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
                .embeddingStore(store)
                .embeddingModel(embeddingModel)
                .documentSplitter(recursive(500, 0))
                .build();
        // Warning - this can take a long time...
        ingestor.ingest(documents);
    }
}

This demonstrates how to store and retrieve embedded documents using Infinispan as the backend.

Metadata Filtering

The Infinispan embedding store supports metadata filtering when searching for embeddings. Metadata is stored as embedded Protobuf objects with typed fields (String, Long, Double), enabling efficient server-side filtering using Infinispan’s Ickle query language.

Supported filter types:

  • IsEqualTo, IsNotEqualTo — equality comparisons

  • IsGreaterThan, IsGreaterThanOrEqualTo — greater-than comparisons

  • IsLessThan, IsLessThanOrEqualTo — less-than comparisons

  • IsIn, IsNotIn — membership checks

  • And, Or, Not — logical operators

Example usage:

import static dev.langchain4j.store.embedding.filter.MetadataFilterBuilder.metadataKey;

EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
    .queryEmbedding(embedding)
    .filter(metadataKey("category").isEqualTo("science"))
    .maxResults(10)
    .build();

EmbeddingSearchResult<TextSegment> result = embeddingStore.search(request);

You can also remove embeddings by filter:

embeddingStore.removeAll(metadataKey("category").isEqualTo("outdated"));

Configuration

By default, the extension uses the default Infinispan client and cache. You can customize its behavior via the following configuration options:

quarkus.langchain4j.infinispan.dimension=384 (1)
quarkus.langchain4j.infinispan.cache-name=my-cache (2)
quarkus.langchain4j.infinispan.distance=5 (3)
quarkus.langchain4j.infinispan.similarity=COSINE (4)
quarkus.langchain4j.infinispan.create-cache=true (5)
quarkus.langchain4j.infinispan.cache-config=<your-config> (6)
1 Required: vector dimension matching your embedding model
2 Cache name (default: embeddings-cache)
3 Maximum distance for knn query (default: 3)
4 Vector similarity metric (default: COSINE). Supported values: COSINE, L2, INNER_PRODUCT, MAX_INNER_PRODUCT
5 Whether to create the cache on startup if it doesn’t exist (default: true). Set to false if using a pre-configured cache on the server
6 Provide a full XML/JSON cache configuration. When set, overrides the default cache configuration

Configuration property fixed at build time - All other configuration properties are overridable at runtime

Configuration property

Type

Default

The name of the Infinispan client to use. These clients are configured by means of the infinispan-client extension. If unspecified, it will use the default Infinispan client.

Environment variable: QUARKUS_LANGCHAIN4J_INFINISPAN_CLIENT_NAME

string

The dimension of the embedding vectors. This has to be the same as the dimension of vectors produced by the embedding model that you use. For example, AllMiniLmL6V2QuantizedEmbeddingModel produces vectors of dimension 384. OpenAI’s text-embedding-ada-002 produces vectors of dimension 1536.

Environment variable: QUARKUS_LANGCHAIN4J_INFINISPAN_DIMENSION

long

required

Name of the cache that will be used in Infinispan when searching for related embeddings. If this cache doesn’t exist, it will be created.

Environment variable: QUARKUS_LANGCHAIN4J_INFINISPAN_CACHE_NAME

string

embeddings-cache

The maximum distance. The most distance between vectors is how close or far apart two embeddings are.

Environment variable: QUARKUS_LANGCHAIN4J_INFINISPAN_DISTANCE

int

3

The similarity metric to use for vector search. Supported values: COSINE, L2, INNER_PRODUCT, MAX_INNER_PRODUCT.

Environment variable: QUARKUS_LANGCHAIN4J_INFINISPAN_SIMILARITY

string

COSINE

Whether to create the cache on startup if it does not exist. Set to false if the cache is pre-configured on the Infinispan server.

Environment variable: QUARKUS_LANGCHAIN4J_INFINISPAN_CREATE_CACHE

boolean

true

Optional full XML or JSON cache configuration. When provided, this overrides the default cache configuration generated by the extension.

Environment variable: QUARKUS_LANGCHAIN4J_INFINISPAN_CACHE_CONFIG

string

How It Works

The Infinispan extension registers a Protobuf schema to define indexable entities with vector and metadata fields. For example, for a dimension of 384, the following schema is generated and registered:

/**
 * @Indexed
 */
message LangchainMetadata384 {

   /**
    * @Basic(projectable=true)
    */
   optional string name = 1;

   /**
    * @Basic(projectable=true)
    */
   optional string value = 2;

   /**
    * @Basic(projectable=true)
    */
   optional int64 value_int = 3;

   /**
    * @Basic(projectable=true)
    */
   optional double value_float = 4;
}

/**
 * @Indexed
 */
message LangchainItem384 {

   /**
    * @Keyword
    */
   optional string id = 1;

   /**
    * @Vector(dimension=384, similarity=COSINE) (1)
    */
   repeated float floatVector = 2;

   optional string text = 3;

   /**
    * @Embedded
    */
   repeated LangchainMetadata384 metadata = 4;
}
1 The similarity metric is configurable via quarkus.langchain4j.infinispan.similarity.

Each metadata entry is stored as an embedded LangchainMetadata object with three typed value fields:

  • value — for String values

  • value_int — for Integer and Long values

  • value_float — for Float and Double values

This typed approach ensures that numeric comparisons (greater-than, less-than, etc.) work correctly during filtering.

Infinispan Cache Configuration

The extension will create an indexed cache if one is not already defined. Below is the default configuration that may be used or customized:

{
  "embeddings-cache": {
    "distributed-cache": {
      "mode": "SYNC",
      "remote-timeout": "17500",
      "statistics": true,
      "locking": {
        "concurrency-level": "1000",
        "acquire-timeout": "15000",
        "striping": false
      },
      "indexing": {
        "enabled": true,
        "storage": "local-heap",
        "indexed-entities": [
          "LangchainItem384"
        ]
      },
      "state-transfer": {
        "timeout": "60000"
      }
    }
  }
}
The name of the indexed entity (LangchainItem384) changes depending on the configured embedding dimension. The metadata type is embedded within the item and does not need to be listed separately.

Summary

To use Infinispan as a distributed vector store for RAG with Quarkus LangChain4j:

  • Ensure Infinispan Server 16.0+ is running with indexing enabled

  • Add the required extension dependency

  • Set the embedding vector dimension

  • Configure or allow the extension to create an indexed cache

  • Use InfinispanEmbeddingStore to ingest and retrieve documents for similarity search

  • Use metadata filters for fine-grained search and removal