Infinispan Embedding Store
Quarkus LangChain4j integrates with Infinispan Server to provide a scalable, distributed vector store for Retrieval-Augmented Generation (RAG). This extension enables you to persist and query embedding vectors for document retrieval.
| This extension uses Infinispan 16.0 capabilities, including embedded metadata objects with typed fields for metadata filtering. |
Prerequisites
To use Infinispan as a vector-capable embedding store:
-
An Infinispan Server 16.0+ must be running and accessible
-
The Quarkus Infinispan client must be configured
-
Vector embeddings must have a fixed dimension that matches your embedding model
|
This extension requires Infinispan Server with Protobuf indexing enabled. It automatically registers the required schema on startup. |
Dependency
To enable Infinispan support in your Quarkus project, add the following dependency:
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-infinispan</artifactId>
<version>1.8.4</version>
</dependency>
Even better, if you use the Quarkus platformn BOM (default for projects generated), add the Quarkus Langchain4J BOM and all dependency versions will align:
<dependencyManagement>
<dependencies>
<dependency>
<groupId>${quarkus.platform.group-id}</groupId>
<artifactId>${quarkus.platform.artifact-id}</artifactId>
<version>${quarkus.platform.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<dependency>
<groupId>${quarkus.platform.group-id}</groupId>
<artifactId>quarkus-langchain4j-bom</artifactId> (1)
<version>${quarkus.platform.version}</version> (2)
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-infinispan</artifactId>
(3)
</dependency>
</dependencies>
| 1 | In your dependencyManagement section, add the quarkus-langchain4j-bom |
| 2 | Inherit the version from your platform version |
| 3 | Voilà, no need for version alignment anymore |
This extension builds upon the Quarkus Infinispan client. Ensure that the default Infinispan client is correctly configured* For more details, see:
Embedding Dimension
You must configure the dimension of the embedding vectors to match your embedding model:
quarkus.langchain4j.infinispan.dimension=384
Common model dimensions:
-
AllMiniLmL6V2QuantizedEmbeddingModel→ 384 -
OpenAI
text-embedding-ada-002→ 1536
|
If the embedding dimension is missing or mismatched, ingestion and retrieval will fail or produce inaccurate results. If you switch to a different embedding model, ensure the |
Usage Example
Once installed and configured, you can use the Infinispan embedding store as follows:
package io.quarkiverse.langchain4j.samples;
import static dev.langchain4j.data.document.splitter.DocumentSplitters.recursive;
import java.util.List;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
import dev.langchain4j.data.document.Document;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import io.quarkiverse.langchain4j.infinispan.InfinispanEmbeddingStore;
@ApplicationScoped
public class IngestorExampleWithInfinispan {
/**
* The embedding store (Infinispan).
* The bean is provided by the quarkus-langchain4j-infinispan extension.
*/
@Inject
InfinispanEmbeddingStore store;
/**
* The embedding model (how is computed the vector of a document).
* The bean is provided by the LLM (like openai) extension.
*/
@Inject
EmbeddingModel embeddingModel;
public void ingest(List<Document> documents) {
EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
.embeddingStore(store)
.embeddingModel(embeddingModel)
.documentSplitter(recursive(500, 0))
.build();
// Warning - this can take a long time...
ingestor.ingest(documents);
}
}
This demonstrates how to store and retrieve embedded documents using Infinispan as the backend.
Metadata Filtering
The Infinispan embedding store supports metadata filtering when searching for embeddings.
Metadata is stored as embedded Protobuf objects with typed fields (String, Long, Double),
enabling efficient server-side filtering using Infinispan’s Ickle query language.
Supported filter types:
-
IsEqualTo,IsNotEqualTo— equality comparisons -
IsGreaterThan,IsGreaterThanOrEqualTo— greater-than comparisons -
IsLessThan,IsLessThanOrEqualTo— less-than comparisons -
IsIn,IsNotIn— membership checks -
And,Or,Not— logical operators
Example usage:
import static dev.langchain4j.store.embedding.filter.MetadataFilterBuilder.metadataKey;
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
.queryEmbedding(embedding)
.filter(metadataKey("category").isEqualTo("science"))
.maxResults(10)
.build();
EmbeddingSearchResult<TextSegment> result = embeddingStore.search(request);
You can also remove embeddings by filter:
embeddingStore.removeAll(metadataKey("category").isEqualTo("outdated"));
Configuration
By default, the extension uses the default Infinispan client and cache. You can customize its behavior via the following configuration options:
quarkus.langchain4j.infinispan.dimension=384 (1)
quarkus.langchain4j.infinispan.cache-name=my-cache (2)
quarkus.langchain4j.infinispan.distance=5 (3)
quarkus.langchain4j.infinispan.similarity=COSINE (4)
quarkus.langchain4j.infinispan.create-cache=true (5)
quarkus.langchain4j.infinispan.cache-config=<your-config> (6)
| 1 | Required: vector dimension matching your embedding model |
| 2 | Cache name (default: embeddings-cache) |
| 3 | Maximum distance for knn query (default: 3) |
| 4 | Vector similarity metric (default: COSINE). Supported values: COSINE, L2, INNER_PRODUCT, MAX_INNER_PRODUCT |
| 5 | Whether to create the cache on startup if it doesn’t exist (default: true). Set to false if using a pre-configured cache on the server |
| 6 | Provide a full XML/JSON cache configuration. When set, overrides the default cache configuration |
Configuration property fixed at build time - All other configuration properties are overridable at runtime
Configuration property |
Type |
Default |
|---|---|---|
The name of the Infinispan client to use. These clients are configured by means of the Environment variable: |
string |
|
The dimension of the embedding vectors. This has to be the same as the dimension of vectors produced by the embedding model that you use. For example, AllMiniLmL6V2QuantizedEmbeddingModel produces vectors of dimension 384. OpenAI’s text-embedding-ada-002 produces vectors of dimension 1536. Environment variable: |
long |
required |
Name of the cache that will be used in Infinispan when searching for related embeddings. If this cache doesn’t exist, it will be created. Environment variable: |
string |
|
The maximum distance. The most distance between vectors is how close or far apart two embeddings are. Environment variable: |
int |
|
The similarity metric to use for vector search. Supported values: COSINE, L2, INNER_PRODUCT, MAX_INNER_PRODUCT. Environment variable: |
string |
|
Whether to create the cache on startup if it does not exist. Set to false if the cache is pre-configured on the Infinispan server. Environment variable: |
boolean |
|
Optional full XML or JSON cache configuration. When provided, this overrides the default cache configuration generated by the extension. Environment variable: |
string |
How It Works
The Infinispan extension registers a Protobuf schema to define indexable entities with vector and metadata fields. For example, for a dimension of 384, the following schema is generated and registered:
/**
* @Indexed
*/
message LangchainMetadata384 {
/**
* @Basic(projectable=true)
*/
optional string name = 1;
/**
* @Basic(projectable=true)
*/
optional string value = 2;
/**
* @Basic(projectable=true)
*/
optional int64 value_int = 3;
/**
* @Basic(projectable=true)
*/
optional double value_float = 4;
}
/**
* @Indexed
*/
message LangchainItem384 {
/**
* @Keyword
*/
optional string id = 1;
/**
* @Vector(dimension=384, similarity=COSINE) (1)
*/
repeated float floatVector = 2;
optional string text = 3;
/**
* @Embedded
*/
repeated LangchainMetadata384 metadata = 4;
}
| 1 | The similarity metric is configurable via quarkus.langchain4j.infinispan.similarity. |
Each metadata entry is stored as an embedded LangchainMetadata object with three typed value fields:
-
value— for String values -
value_int— for Integer and Long values -
value_float— for Float and Double values
This typed approach ensures that numeric comparisons (greater-than, less-than, etc.) work correctly during filtering.
Infinispan Cache Configuration
The extension will create an indexed cache if one is not already defined. Below is the default configuration that may be used or customized:
{
"embeddings-cache": {
"distributed-cache": {
"mode": "SYNC",
"remote-timeout": "17500",
"statistics": true,
"locking": {
"concurrency-level": "1000",
"acquire-timeout": "15000",
"striping": false
},
"indexing": {
"enabled": true,
"storage": "local-heap",
"indexed-entities": [
"LangchainItem384"
]
},
"state-transfer": {
"timeout": "60000"
}
}
}
}
The name of the indexed entity (LangchainItem384) changes depending on the configured embedding dimension. The metadata type is embedded within the item and does not need to be listed separately.
|
Summary
To use Infinispan as a distributed vector store for RAG with Quarkus LangChain4j:
-
Ensure Infinispan Server 16.0+ is running with indexing enabled
-
Add the required extension dependency
-
Set the embedding vector dimension
-
Configure or allow the extension to create an indexed cache
-
Use
InfinispanEmbeddingStoreto ingest and retrieve documents for similarity search -
Use metadata filters for fine-grained search and removal