Weaviate Embedding Store

Weaviate is a scalable vector-native database designed for semantic search and Retrieval-Augmented Generation (RAG) use cases. This guide explains how to use Weaviate as an embedding store in Quarkus LangChain4j.

Overview

Weaviate stores text segments and their corresponding embeddings and exposes powerful similarity search capabilities. With Quarkus LangChain4j, you can ingest documents and perform vector-based retrieval with minimal setup.

Dependency

To enable Weaviate support in your Quarkus application, add the following dependency:

<dependency>
    <groupId>io.quarkiverse.langchain4j</groupId>
    <artifactId>quarkus-langchain4j-weaviate</artifactId>
    <version>1.0.2</version>
</dependency>

Dev Services Support

When running in development or test mode, the extension will automatically start a containerized Weaviate instance using Dev Services, unless a host is explicitly configured.

You can disable the Dev Service or connect to an existing Weaviate instance by configuring:

quarkus.langchain4j.weaviate.host=localhost
quarkus.langchain4j.weaviate.port=8080

When using a remote Weaviate instance, Dev Services are automatically disabled.

Usage Example

Once configured, you can use the Weaviate embedding store like any other vector store:

package io.quarkiverse.langchain4j.samples;

import static dev.langchain4j.data.document.splitter.DocumentSplitters.recursive;

import java.util.List;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

import dev.langchain4j.data.document.Document;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import dev.langchain4j.store.embedding.weaviate.WeaviateEmbeddingStore;

@ApplicationScoped
public class IngestorExampleWithWeaviate {

    /**
     * The embedding store (the database).
     * The bean is provided by the quarkus-langchain4j-weaviate extension.
     */
    @Inject
    WeaviateEmbeddingStore store;

    /**
     * The embedding model (how is computed the vector of a document).
     * The bean is provided by the LLM (like openai) extension.
     */
    @Inject
    EmbeddingModel embeddingModel;

    public void ingest(List<Document> documents) {
        EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
                .embeddingStore(store)
                .embeddingModel(embeddingModel)
                .documentSplitter(recursive(500, 0))
                .build();
        // Warning - this can take a long time...
        ingestor.ingest(documents);
    }
}

This allows you to ingest documents and perform similarity queries with any supported embedding model.

Configuration

You can customize the behavior of the extension using the following configuration options:

Configuration property fixed at build time - All other configuration properties are overridable at runtime

Configuration property

Type

Default

If DevServices has been explicitly enabled or disabled. DevServices is generally enabled by default, unless there is an existing configuration present.

When DevServices is enabled Quarkus will attempt to automatically configure and start a database when running in Dev or Test mode and when Docker is running.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_DEVSERVICES_ENABLED

boolean

true

The container image name to use, for container based DevServices providers. If you want to use Redis Stack modules (bloom, graph, search…​), use: redis/redis-stack:latest.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_DEVSERVICES_IMAGE_NAME

string

cr.weaviate.io/semitechnologies/weaviate:1.25.5

Optional fixed port the dev service will listen to.

If not defined, the port will be chosen randomly.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_DEVSERVICES_PORT

int

Indicates if the Redis server managed by Quarkus Dev Services is shared. When shared, Quarkus looks for running containers using label-based service discovery. If a matching container is found, it is used, and so a second one is not started. Otherwise, Dev Services for Redis starts a new container.

The discovery uses the quarkus-dev-service-weaviate label. The value is configured using the service-name property.

Container sharing is only used in dev mode.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_DEVSERVICES_SHARED

boolean

true

The value of the quarkus-dev-service-weaviate label attached to the started container. This property is used when shared is set to true. In this case, before starting a container, Dev Services for Redis looks for a container with the quarkus-dev-service-weaviate label set to the configured value. If found, it will use this container instead of starting a new one. Otherwise, it starts a new container with the quarkus-dev-service-weaviate label set to the specified value.

This property is used when you need multiple shared Weaviate servers.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_DEVSERVICES_SERVICE_NAME

string

weaviate

Environment variables that are passed to the container.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_DEVSERVICES_CONTAINER_ENV__CONTAINER_ENV_

Map<String,String>

The Weaviate API key to authenticate with.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_API_KEY

string

The scheme, e.g. "https" of cluster URL. Find it under Details of your Weaviate cluster.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_SCHEME

string

http

The URL of the Weaviate server.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_HOST

string

localhost

The gRPC port of the Weaviate server. Defaults to 8080

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_PORT

int

8080

The gRPC port of the Weaviate server. Defaults to 50051

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_GRPC_PORT

int

50051

The gRPC connection is secured.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_GRPC_SECURE

boolean

false

Use gRPC instead of http for batch inserts only. Will still be used for search.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_GRPC_USE_FOR_INSERTS

boolean

false

The object class you want to store, e.g. "MyGreatClass". Must start from an uppercase letter.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_OBJECT_CLASS

string

Default

The name of the field that contains the text of a TextSegment. Default is "text"

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_TEXT_FIELD_NAME

string

text

If true (default), then WeaviateEmbeddingStore will generate a hashed ID based on provided text segment, which avoids duplicated entries in DB. If false, then random ID will be generated.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_AVOID_DUPS

boolean

false

Consistency level: ONE, QUORUM (default) or ALL.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_CONSISTENCY_LEVEL

one, quorum, all

quorum

Metadata keys that should be persisted. The default in Weaviate [], however it is required to specify at least one for the EmbeddingStore to work. Thus, we use "tags" as default

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_METADATA_KEYS

list of string

tags

The name of the field where Metadata entries are stored

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_METADATA_FIELD_NAME

string

_metadata

How It Works

  • Ingested content is stored as objects in a Weaviate class with associated vector embeddings.

  • The extension uses nearVector queries to perform KNN-based similarity search.

  • Metadata is stored as custom object properties and returned in search results.

Summary

To use Weaviate with Quarkus LangChain4j:

  1. Add the quarkus-langchain4j-weaviate extension

  2. Configure a local or remote Weaviate instance

  3. Ingest and retrieve documents using the WeaviateEmbeddingStore