Infinispan Embedding Store
Quarkus LangChain4j integrates with Infinispan Server to provide a scalable, distributed vector store for Retrieval-Augmented Generation (RAG). This extension enables you to persist and query embedding vectors for document retrieval.
Prerequisites
To use Infinispan as a vector-capable embedding store:
-
An Infinispan Server must be running and accessible
-
The Quarkus Infinispan client must be configured
-
Vector embeddings must have a fixed dimension that matches your embedding model
This extension requires Infinispan Server with Protobuf indexing enabled. It automatically registers the required schema on startup. |
Dependency
To enable Infinispan support in your Quarkus project, add the following dependency:
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-infinispan</artifactId>
<version>1.0.2</version>
</dependency>
This extension builds upon the Quarkus Infinispan client. Ensure that the default Infinispan client is correctly configured* For more details, see:
Embedding Dimension
You must configure the dimension of the embedding vectors to match your embedding model:
quarkus.langchain4j.infinispan.dimension=384
Common model dimensions:
-
AllMiniLmL6V2QuantizedEmbeddingModel
→ 384 -
OpenAI
text-embedding-ada-002
→ 1536
If the embedding dimension is missing or mismatched, ingestion and retrieval will fail or produce inaccurate results. If you switch to a different embedding model, ensure the |
Usage Example
Once installed and configured, you can use the Infinispan embedding store as follows:
package io.quarkiverse.langchain4j.samples;
import static dev.langchain4j.data.document.splitter.DocumentSplitters.recursive;
import java.util.List;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
import dev.langchain4j.data.document.Document;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import io.quarkiverse.langchain4j.infinispan.InfinispanEmbeddingStore;
@ApplicationScoped
public class IngestorExampleWithInfinispan {
/**
* The embedding store (Infinispan).
* The bean is provided by the quarkus-langchain4j-infinispan extension.
*/
@Inject
InfinispanEmbeddingStore store;
/**
* The embedding model (how is computed the vector of a document).
* The bean is provided by the LLM (like openai) extension.
*/
@Inject
EmbeddingModel embeddingModel;
public void ingest(List<Document> documents) {
EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
.embeddingStore(store)
.embeddingModel(embeddingModel)
.documentSplitter(recursive(500, 0))
.build();
// Warning - this can take a long time...
ingestor.ingest(documents);
}
}
This demonstrates how to store and retrieve embedded documents using Infinispan as the backend.
Configuration
By default, the extension uses the default Infinispan client and cache. You can customize its behavior via the following configuration options:
Configuration property fixed at build time - All other configuration properties are overridable at runtime
Configuration property |
Type |
Default |
---|---|---|
The name of the Infinispan client to use. These clients are configured by means of the Environment variable: |
string |
|
The dimension of the embedding vectors. This has to be the same as the dimension of vectors produced by the embedding model that you use. For example, AllMiniLmL6V2QuantizedEmbeddingModel produces vectors of dimension 384. OpenAI’s text-embedding-ada-002 produces vectors of dimension 1536. Environment variable: |
long |
required |
Name of the cache that will be used in Infinispan when searching for related embeddings. If this cache doesn’t exist, it will be created. Environment variable: |
string |
|
The maximum distance. The most distance between vectors is how close or far apart two embeddings are. Environment variable: |
int |
|
How It Works
The Infinispan extension registers a Protobuf schema to define an indexable entity with a vector field. For example, for a dimension of 384, the following schema is generated and registered:
/**
* @Indexed
*/
message LangchainItem384 {
/**
* @Keyword
*/
optional string id = 1;
/**
* @Vector(dimension=384, similarity=COSINE)
*/
repeated float floatVector = 2;
optional string text = 3;
repeated string metadataKeys = 4;
repeated string metadataValues = 5;
}
The embedding vector is stored as a repeated float
and indexed for similarity search.
Infinispan Cache Configuration
The extension will create an indexed cache if one is not already defined. Below is the default configuration that may be used or customized:
{
"embeddings-cache": {
"distributed-cache": {
"mode": "SYNC",
"remote-timeout": "17500",
"statistics": true,
"locking": {
"concurrency-level": "1000",
"acquire-timeout": "15000",
"striping": false
},
"indexing": {
"enabled": true,
"storage": "local-heap",
"indexed-entities": [
"LangchainItem384"
]
},
"state-transfer": {
"timeout": "60000"
}
}
}
}
The name of the indexed entity (LangchainItem384 ) changes depending on the configured embedding dimension.
|
Summary
To use Infinispan as a distributed vector store for RAG with Quarkus LangChain4j:
-
Ensure Infinispan Server is running with indexing enabled
-
Add the required extension dependency
-
Set the embedding vector dimension
-
Configure or allow the extension to create an indexed cache
-
Use
InfinispanEmbeddingStore
to ingest and retrieve documents for similarity search