Chroma Document Store

Chroma is a lightweight, open-source vector database designed for embedding-based search. It can be used as a document store in Retrieval-Augmented Generation (RAG) pipelines with Quarkus LangChain4j.

This guide explains how to configure and use Chroma as an embedding-aware document store.

Dependency

To enable Chroma integration in your Quarkus project, add the following Maven dependency:

<dependency>
  <groupId>io.quarkiverse.langchain4j</groupId>
  <artifactId>quarkus-langchain4j-chroma</artifactId>
  <version>1.10.0</version>
</dependency>

Even better, if you use the Quarkus platform BOM (default for projects generated), add the Quarkus Langchain4J BOM and all dependency versions will align:

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>${quarkus.platform.group-id}</groupId>
                <artifactId>${quarkus.platform.artifact-id}</artifactId>
                <version>${quarkus.platform.version}</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
            <dependency>
                <groupId>${quarkus.platform.group-id}</groupId>
                <artifactId>quarkus-langchain4j-bom</artifactId> (1)
                <version>${quarkus.platform.version}</version> (2)
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>

    <dependencies>
      <dependency>
        <groupId>io.quarkiverse.langchain4j</groupId>
        <artifactId>quarkus-langchain4j-chroma</artifactId>
        (3)
      </dependency>
    </dependencies>
1 In your dependencyManagement section, add the quarkus-langchain4j-bom
2 Inherit the version from your platform version
3 Voilà, no need for version alignment anymore

Dev Services Support

The Chroma extension provides Dev Services support: when running in development or test mode, a containerized Chroma instance will be started automatically, no manual configuration required.

If you wish to customize the container behavior, such as the image or exposed ports, you can use standard quarkus.devservices.* properties. For example:

quarkus.langchain4j.chroma.devservices.image-name=ghcr.io/chroma-core/chroma:latest

Refer to the configuration section below for more options.

Usage Example

Once the extension is installed and the dev service (or external Chroma instance) is available, you can use the Chroma document store as follows:

package io.quarkiverse.langchain4j.samples;

import static dev.langchain4j.data.document.splitter.DocumentSplitters.recursive;

import java.util.List;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

import dev.langchain4j.data.document.Document;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import dev.langchain4j.store.embedding.chroma.ChromaEmbeddingStore;

@ApplicationScoped
public class IngestorExampleWithChroma {

    /**
     * The embedding store (the database).
     * The bean is provided by the quarkus-langchain4j-chroma extension.
     */
    @Inject
    ChromaEmbeddingStore store;

    /**
     * The embedding model (how is computed the vector of a document).
     * The bean is provided by the LLM (like openai) extension.
     */
    @Inject
    EmbeddingModel embeddingModel;

    public void ingest(List<Document> documents) {
        EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
                .embeddingStore(store)
                .embeddingModel(embeddingModel)
                .documentSplitter(recursive(500, 0))
                .build();
        // Warning - this can take a long time...
        ingestor.ingest(documents);
    }
}

This example demonstrates how to ingest content into Chroma, where it will be indexed and stored with its vector embedding.

Configuration

The Chroma extension can be configured using the following options:

Configuration property fixed at build time - All other configuration properties are overridable at runtime

Configuration property

Type

Default

Whether the default (unnamed) Chroma embedding store should be enabled. Set to false when you only want to use named stores.

Environment variable: QUARKUS_LANGCHAIN4J_CHROMA_DEFAULT_STORE_ENABLED

boolean

true

If DevServices has been explicitly enabled or disabled. DevServices is generally enabled by default, unless there is an existing configuration present.

When DevServices is enabled Quarkus will attempt to automatically configure and start a database when running in Dev or Test mode and when Docker is running.

Environment variable: QUARKUS_LANGCHAIN4J_CHROMA_DEVSERVICES_ENABLED

boolean

true

The container image name to use, for container based DevServices providers.

Environment variable: QUARKUS_LANGCHAIN4J_CHROMA_DEVSERVICES_IMAGE_NAME

string

ghcr.io/chroma-core/chroma:1.3.0

Optional fixed port the dev service will listen to.

If not defined, the port will be chosen randomly.

Environment variable: QUARKUS_LANGCHAIN4J_CHROMA_DEVSERVICES_PORT

int

Indicates if the Chroma server managed by Quarkus Dev Services is shared. When shared, Quarkus looks for running containers using label-based service discovery. If a matching container is found, it is used, and so a second one is not started. Otherwise, Dev Services for Chroma starts a new container.

The discovery uses the quarkus-dev-service-chroma label. The value is configured using the service-name property.

Container sharing is only used in dev mode.

Environment variable: QUARKUS_LANGCHAIN4J_CHROMA_DEVSERVICES_SHARED

boolean

true

The value of the quarkus-dev-service-chroma label attached to the started container. This property is used when shared is set to true. In this case, before starting a container, Dev Services for Chroma looks for a container with the quarkus-dev-service-chroma label set to the configured value. If found, it will use this container instead of starting a new one. Otherwise, it starts a new container with the quarkus-dev-service-chroma label set to the specified value.

This property is used when you need multiple shared Chroma servers.

Environment variable: QUARKUS_LANGCHAIN4J_CHROMA_DEVSERVICES_SERVICE_NAME

string

chroma

Environment variables that are passed to the container.

Environment variable: QUARKUS_LANGCHAIN4J_CHROMA_DEVSERVICES_CONTAINER_ENV__CONTAINER_ENV_

Map<String,String>

URL where the Chroma database is listening for requests

Environment variable: QUARKUS_LANGCHAIN4J_CHROMA_URL

string

The collection name.

Environment variable: QUARKUS_LANGCHAIN4J_CHROMA_COLLECTION_NAME

string

default

The timeout duration for the Chroma client. If not specified, 5 seconds will be used.

Environment variable: QUARKUS_LANGCHAIN4J_CHROMA_TIMEOUT

Duration 

Whether requests to Chroma should be logged

Environment variable: QUARKUS_LANGCHAIN4J_CHROMA_LOG_REQUESTS

boolean

false

Whether responses from Chroma should be logged

Environment variable: QUARKUS_LANGCHAIN4J_CHROMA_LOG_RESPONSES

boolean

false

The Chroma API version to use. V1 is deprecated (Chroma 0.x) and its support will be removed in the future. Please use Chroma 1.x which uses the V2 API.

Environment variable: QUARKUS_LANGCHAIN4J_CHROMA_API_VERSION

v1, v2

v2

Named store configurations

Type

Default

The collection name for this named store. This property serves as the build-time key that enables named store discovery. If not set, the collection name from the runtime configuration will be used.

Environment variable: QUARKUS_LANGCHAIN4J_CHROMA__STORE_NAME__COLLECTION_NAME

string

URL where the Chroma database is listening for requests

Environment variable: QUARKUS_LANGCHAIN4J_CHROMA__STORE_NAME__URL

string

The collection name.

Environment variable: QUARKUS_LANGCHAIN4J_CHROMA__STORE_NAME__COLLECTION_NAME

string

default

The timeout duration for the Chroma client. If not specified, 5 seconds will be used.

Environment variable: QUARKUS_LANGCHAIN4J_CHROMA__STORE_NAME__TIMEOUT

Duration 

Whether requests to Chroma should be logged

Environment variable: QUARKUS_LANGCHAIN4J_CHROMA__STORE_NAME__LOG_REQUESTS

boolean

false

Whether responses from Chroma should be logged

Environment variable: QUARKUS_LANGCHAIN4J_CHROMA__STORE_NAME__LOG_RESPONSES

boolean

false

The Chroma API version to use. V1 is deprecated (Chroma 0.x) and its support will be removed in the future. Please use Chroma 1.x which uses the V2 API.

Environment variable: QUARKUS_LANGCHAIN4J_CHROMA__STORE_NAME__API_VERSION

v1, v2

v2

About the Duration format

To write duration values, use the standard java.time.Duration format. See the Duration#parse() Java API documentation for more information.

You can also use a simplified format, starting with a number:

  • If the value is only a number, it represents time in seconds.

  • If the value is a number followed by ms, it represents time in milliseconds.

In other cases, the simplified format is translated to the java.time.Duration format for parsing:

  • If the value is a number followed by h, m, or s, it is prefixed with PT.

  • If the value is a number followed by d, it is prefixed with P.

Named Stores

You can configure multiple named Chroma stores, each using a different collection. This is useful when your application needs to manage embeddings for different domains or tenants in separate collections within the same Chroma instance.

To configure a named store:

quarkus.langchain4j.chroma.products.collection-name=product_embeddings
quarkus.langchain4j.chroma.products.url=http://chroma.example.com:8000

To inject a named store, use the @EmbeddingStoreName qualifier:

@Inject
@EmbeddingStoreName("products")
EmbeddingStore<TextSegment> productsStore;

The default store and named stores can coexist. If you only need named stores, disable the default store:

quarkus.langchain4j.chroma.default-store-enabled=false

Notes

  • Chroma supports metadata, but filtering capabilities may depend on the current Chroma version and API behavior.

  • The embedding vector size must match the dimension of your embedding model.

  • The Chroma backend is typically local (SQLite-based), but distributed setups may be available depending on your deployment.

Summary

To use Chroma as a vector store for RAG with Quarkus LangChain4j:

  • Add the Chroma extension to your project.

  • Ensure your embedding model’s vector dimension matches your configuration.

  • Use Dev Services for a containerized Chroma instance in dev/test mode.

  • Use the ChromaEmbeddingStore to ingest and retrieve vectorized documents.