Chroma Document Store
Chroma is a lightweight, open-source vector database designed for embedding-based search. It can be used as a document store in Retrieval-Augmented Generation (RAG) pipelines with Quarkus LangChain4j.
This guide explains how to configure and use Chroma as an embedding-aware document store.
Dependency
To enable Chroma integration in your Quarkus project, add the following Maven dependency:
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-chroma</artifactId>
<version>1.0.2</version>
</dependency>
Dev Services Support
The Chroma extension provides Dev Services support: when running in development or test mode, a containerized Chroma instance will be started automatically, no manual configuration required.
If you wish to customize the container behavior, such as the image or exposed ports, you can use standard quarkus.devservices.*
properties.
For example:
quarkus.langchain4j.chroma.devservices.image-name=ghcr.io/chroma-core/chroma:latest
Refer to the configuration section below for more options.
Usage Example
Once the extension is installed and the dev service (or external Chroma instance) is available, you can use the Chroma document store as follows:
package io.quarkiverse.langchain4j.samples;
import static dev.langchain4j.data.document.splitter.DocumentSplitters.recursive;
import java.util.List;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
import dev.langchain4j.data.document.Document;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import io.quarkiverse.langchain4j.chroma.ChromaEmbeddingStore;
@ApplicationScoped
public class IngestorExampleWithChroma {
/**
* The embedding store (the database).
* The bean is provided by the quarkus-langchain4j-chroma extension.
*/
@Inject
ChromaEmbeddingStore store;
/**
* The embedding model (how is computed the vector of a document).
* The bean is provided by the LLM (like openai) extension.
*/
@Inject
EmbeddingModel embeddingModel;
public void ingest(List<Document> documents) {
EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
.embeddingStore(store)
.embeddingModel(embeddingModel)
.documentSplitter(recursive(500, 0))
.build();
// Warning - this can take a long time...
ingestor.ingest(documents);
}
}
This example demonstrates how to ingest content into Chroma, where it will be indexed and stored with its vector embedding.
Configuration
The Chroma extension can be configured using the following options:
Configuration property fixed at build time - All other configuration properties are overridable at runtime
Configuration property |
Type |
Default |
---|---|---|
If DevServices has been explicitly enabled or disabled. DevServices is generally enabled by default, unless there is an existing configuration present. When DevServices is enabled Quarkus will attempt to automatically configure and start a database when running in Dev or Test mode and when Docker is running. Environment variable: |
boolean |
|
The container image name to use, for container based DevServices providers. If you want to use Redis Stack modules (bloom, graph, search…), use: Environment variable: |
string |
|
Optional fixed port the dev service will listen to. If not defined, the port will be chosen randomly. Environment variable: |
int |
|
Indicates if the Redis server managed by Quarkus Dev Services is shared. When shared, Quarkus looks for running containers using label-based service discovery. If a matching container is found, it is used, and so a second one is not started. Otherwise, Dev Services for Redis starts a new container. The discovery uses the Container sharing is only used in dev mode. Environment variable: |
boolean |
|
The value of the This property is used when you need multiple shared Chroma servers. Environment variable: |
string |
|
Environment variables that are passed to the container. Environment variable: |
Map<String,String> |
|
URL where the Chroma database is listening for requests Environment variable: |
string |
required |
The collection name. Environment variable: |
string |
|
The timeout duration for the Chroma client. If not specified, 5 seconds will be used. Environment variable: |
||
Whether requests to Chroma should be logged Environment variable: |
boolean |
|
Whether responses from Chroma should be logged Environment variable: |
boolean |
|
About the Duration format
To write duration values, use the standard You can also use a simplified format, starting with a number:
In other cases, the simplified format is translated to the
|
Notes
-
Chroma supports metadata, but filtering capabilities may depend on the current Chroma version and API behavior.
-
The embedding vector size must match the dimension of your embedding model.
-
The Chroma backend is typically local (SQLite-based), but distributed setups may be available depending on your deployment.
Summary
To use Chroma as a vector store for RAG with Quarkus LangChain4j:
-
Add the Chroma extension to your project.
-
Ensure your embedding model’s vector dimension matches your configuration.
-
Use Dev Services for a containerized Chroma instance in dev/test mode.
-
Use the
ChromaEmbeddingStore
to ingest and retrieve vectorized documents.