Weaviate Embedding Store

Weaviate is a scalable vector-native database designed for semantic search and Retrieval-Augmented Generation (RAG) use cases. This guide explains how to use Weaviate as an embedding store in Quarkus LangChain4j.

Overview

Weaviate stores text segments and their corresponding embeddings and exposes powerful similarity search capabilities. With Quarkus LangChain4j, you can ingest documents and perform vector-based retrieval with minimal setup.

Dependency

To enable Weaviate support in your Quarkus application, add the following dependency:

<dependency>
  <groupId>io.quarkiverse.langchain4j</groupId>
  <artifactId>quarkus-langchain4j-weaviate</artifactId>
  <version>1.10.0</version>
</dependency>

Even better, if you use the Quarkus platform BOM (default for projects generated), add the Quarkus Langchain4J BOM and all dependency versions will align:

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>${quarkus.platform.group-id}</groupId>
                <artifactId>${quarkus.platform.artifact-id}</artifactId>
                <version>${quarkus.platform.version}</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
            <dependency>
                <groupId>${quarkus.platform.group-id}</groupId>
                <artifactId>quarkus-langchain4j-bom</artifactId> (1)
                <version>${quarkus.platform.version}</version> (2)
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>

    <dependencies>
      <dependency>
        <groupId>io.quarkiverse.langchain4j</groupId>
        <artifactId>quarkus-langchain4j-weaviate</artifactId>
        (3)
      </dependency>
    </dependencies>
1 In your dependencyManagement section, add the quarkus-langchain4j-bom
2 Inherit the version from your platform version
3 VoilĂ , no need for version alignment anymore

Dev Services Support

When running in development or test mode, the extension will automatically start a containerized Weaviate instance using Dev Services, unless a host is explicitly configured.

You can disable the Dev Service or connect to an existing Weaviate instance by configuring:

quarkus.langchain4j.weaviate.host=localhost
quarkus.langchain4j.weaviate.port=8080

When using a remote Weaviate instance, Dev Services are automatically disabled.

Usage Example

Once configured, you can use the Weaviate embedding store like any other vector store:

package io.quarkiverse.langchain4j.samples;

import static dev.langchain4j.data.document.splitter.DocumentSplitters.recursive;

import java.util.List;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

import dev.langchain4j.data.document.Document;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import dev.langchain4j.store.embedding.weaviate.WeaviateEmbeddingStore;

@ApplicationScoped
public class IngestorExampleWithWeaviate {

    /**
     * The embedding store (the database).
     * The bean is provided by the quarkus-langchain4j-weaviate extension.
     */
    @Inject
    WeaviateEmbeddingStore store;

    /**
     * The embedding model (how is computed the vector of a document).
     * The bean is provided by the LLM (like openai) extension.
     */
    @Inject
    EmbeddingModel embeddingModel;

    public void ingest(List<Document> documents) {
        EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
                .embeddingStore(store)
                .embeddingModel(embeddingModel)
                .documentSplitter(recursive(500, 0))
                .build();
        // Warning - this can take a long time...
        ingestor.ingest(documents);
    }
}

This allows you to ingest documents and perform similarity queries with any supported embedding model.

Configuration

You can customize the behavior of the extension using the following configuration options:

Configuration property fixed at build time - All other configuration properties are overridable at runtime

Configuration property

Type

Default

Whether the default (unnamed) Weaviate embedding store should be enabled. Set to false when you only want to use named stores.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_DEFAULT_STORE_ENABLED

boolean

true

If DevServices has been explicitly enabled or disabled. DevServices is generally enabled by default, unless there is an existing configuration present.

When DevServices is enabled Quarkus will attempt to automatically configure and start a database when running in Dev or Test mode and when Docker is running.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_DEVSERVICES_ENABLED

boolean

true

The container image name to use, for container based DevServices providers.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_DEVSERVICES_IMAGE_NAME

string

cr.weaviate.io/semitechnologies/weaviate:1.25.5

Optional fixed port the dev service will listen to.

If not defined, the port will be chosen randomly.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_DEVSERVICES_PORT

int

Indicates if the Weaviate server managed by Quarkus Dev Services is shared. When shared, Quarkus looks for running containers using label-based service discovery. If a matching container is found, it is used, and so a second one is not started. Otherwise, Dev Services for Weaviate starts a new container.

The discovery uses the quarkus-dev-service-weaviate label. The value is configured using the service-name property.

Container sharing is only used in dev mode.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_DEVSERVICES_SHARED

boolean

true

The value of the quarkus-dev-service-weaviate label attached to the started container. This property is used when shared is set to true. In this case, before starting a container, Dev Services for Weaviate looks for a container with the quarkus-dev-service-weaviate label set to the configured value. If found, it will use this container instead of starting a new one. Otherwise, it starts a new container with the quarkus-dev-service-weaviate label set to the specified value.

This property is used when you need multiple shared Weaviate servers.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_DEVSERVICES_SERVICE_NAME

string

weaviate

Environment variables that are passed to the container.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_DEVSERVICES_CONTAINER_ENV__CONTAINER_ENV_

Map<String,String>

The Weaviate API key to authenticate with.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_API_KEY

string

The scheme, e.g. "https" of cluster URL. Find it under Details of your Weaviate cluster.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_SCHEME

string

http

The URL of the Weaviate server.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_HOST

string

localhost

The port of the Weaviate server. Defaults to 8080

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_PORT

int

8080

The gRPC port of the Weaviate server. Defaults to 50051

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_GRPC_PORT

int

50051

The gRPC connection is secured.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_GRPC_SECURE

boolean

false

Use gRPC instead of http for batch inserts only. Will still be used for search.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_GRPC_USE_FOR_INSERTS

boolean

false

The object class you want to store, e.g. "MyGreatClass". Must start from an uppercase letter.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_OBJECT_CLASS

string

Default

The name of the field that contains the text of a dev.langchain4j.data.segment.TextSegment. Default is "text"

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_TEXT_FIELD_NAME

string

text

If true (default), then WeaviateEmbeddingStore will generate a hashed ID based on provided text segment, which avoids duplicated entries in DB. If false, then random ID will be generated.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_AVOID_DUPS

boolean

false

Consistency level: ONE, QUORUM (default) or ALL.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_CONSISTENCY_LEVEL

one, quorum, all

quorum

Metadata keys that should be persisted. The default in Weaviate [], however it is required to specify at least one for the EmbeddingStore to work. Thus, we use "tags" as default

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_METADATA_KEYS

list of string

tags

The name of the field where dev.langchain4j.data.segment.Metadata entries are stored

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE_METADATA_FIELD_NAME

string

_metadata

Named store configurations

Type

Default

The object class used as the build-time discovery key for this named store. Each named store is identified by its object class within the same Weaviate server.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE__STORE_NAME__OBJECT_CLASS

string

The Weaviate API key to authenticate with.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE__STORE_NAME__API_KEY

string

The scheme, e.g. "https" of cluster URL. Find it under Details of your Weaviate cluster.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE__STORE_NAME__SCHEME

string

http

The URL of the Weaviate server.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE__STORE_NAME__HOST

string

localhost

The port of the Weaviate server. Defaults to 8080

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE__STORE_NAME__PORT

int

8080

The gRPC port of the Weaviate server. Defaults to 50051

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE__STORE_NAME__GRPC_PORT

int

50051

The gRPC connection is secured.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE__STORE_NAME__GRPC_SECURE

boolean

false

Use gRPC instead of http for batch inserts only. Will still be used for search.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE__STORE_NAME__GRPC_USE_FOR_INSERTS

boolean

false

The object class you want to store, e.g. "MyGreatClass". Must start from an uppercase letter.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE__STORE_NAME__OBJECT_CLASS

string

Default

The name of the field that contains the text of a dev.langchain4j.data.segment.TextSegment. Default is "text"

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE__STORE_NAME__TEXT_FIELD_NAME

string

text

If true (default), then WeaviateEmbeddingStore will generate a hashed ID based on provided text segment, which avoids duplicated entries in DB. If false, then random ID will be generated.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE__STORE_NAME__AVOID_DUPS

boolean

false

Consistency level: ONE, QUORUM (default) or ALL.

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE__STORE_NAME__CONSISTENCY_LEVEL

one, quorum, all

quorum

Metadata keys that should be persisted. The default in Weaviate [], however it is required to specify at least one for the EmbeddingStore to work. Thus, we use "tags" as default

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE__STORE_NAME__METADATA_KEYS

list of string

tags

The name of the field where dev.langchain4j.data.segment.Metadata entries are stored

Environment variable: QUARKUS_LANGCHAIN4J_WEAVIATE__STORE_NAME__METADATA_FIELD_NAME

string

_metadata

Named Stores

You can configure multiple named Weaviate stores, each pointing to a different Weaviate instance or using a different object class. This is useful when your application needs to manage embeddings for different domains or tenants separately.

To configure a named store:

quarkus.langchain4j.weaviate.products.object-class=ProductClass
quarkus.langchain4j.weaviate.products.metadata.keys=tags

To point to a different Weaviate instance, specify its scheme, host, and port:

quarkus.langchain4j.weaviate.products.scheme=https
quarkus.langchain4j.weaviate.products.host=weaviate.example.com
quarkus.langchain4j.weaviate.products.port=8080
quarkus.langchain4j.weaviate.products.object-class=ProductClass

To inject a named store, use the @EmbeddingStoreName qualifier:

@Inject
@EmbeddingStoreName("products")
WeaviateEmbeddingStore productsStore;

@Inject
@EmbeddingStoreName("products")
WeaviateClient productsClient;

The default store and named stores can coexist. If you only need named stores, disable the default store:

quarkus.langchain4j.weaviate.default-store-enabled=false

How It Works

  • Ingested content is stored as objects in a Weaviate class with associated vector embeddings.

  • The extension uses nearVector queries to perform KNN-based similarity search.

  • Metadata is stored as custom object properties and returned in search results.

Summary

To use Weaviate with Quarkus LangChain4j:

  1. Add the quarkus-langchain4j-weaviate extension

  2. Configure a local or remote Weaviate instance

  3. Ingest and retrieve documents using the WeaviateEmbeddingStore