Weaviate Embedding Store
Weaviate is a scalable vector-native database designed for semantic search and Retrieval-Augmented Generation (RAG) use cases. This guide explains how to use Weaviate as an embedding store in Quarkus LangChain4j.
Overview
Weaviate stores text segments and their corresponding embeddings and exposes powerful similarity search capabilities. With Quarkus LangChain4j, you can ingest documents and perform vector-based retrieval with minimal setup.
Dependency
To enable Weaviate support in your Quarkus application, add the following dependency:
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-weaviate</artifactId>
<version>1.0.2</version>
</dependency>
Dev Services Support
When running in development or test mode, the extension will automatically start a containerized Weaviate instance using Dev Services, unless a host
is explicitly configured.
You can disable the Dev Service or connect to an existing Weaviate instance by configuring:
quarkus.langchain4j.weaviate.host=localhost
quarkus.langchain4j.weaviate.port=8080
When using a remote Weaviate instance, Dev Services are automatically disabled. |
Usage Example
Once configured, you can use the Weaviate embedding store like any other vector store:
package io.quarkiverse.langchain4j.samples;
import static dev.langchain4j.data.document.splitter.DocumentSplitters.recursive;
import java.util.List;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
import dev.langchain4j.data.document.Document;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import dev.langchain4j.store.embedding.weaviate.WeaviateEmbeddingStore;
@ApplicationScoped
public class IngestorExampleWithWeaviate {
/**
* The embedding store (the database).
* The bean is provided by the quarkus-langchain4j-weaviate extension.
*/
@Inject
WeaviateEmbeddingStore store;
/**
* The embedding model (how is computed the vector of a document).
* The bean is provided by the LLM (like openai) extension.
*/
@Inject
EmbeddingModel embeddingModel;
public void ingest(List<Document> documents) {
EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
.embeddingStore(store)
.embeddingModel(embeddingModel)
.documentSplitter(recursive(500, 0))
.build();
// Warning - this can take a long time...
ingestor.ingest(documents);
}
}
This allows you to ingest documents and perform similarity queries with any supported embedding model.
Configuration
You can customize the behavior of the extension using the following configuration options:
Configuration property fixed at build time - All other configuration properties are overridable at runtime
Configuration property |
Type |
Default |
---|---|---|
If DevServices has been explicitly enabled or disabled. DevServices is generally enabled by default, unless there is an existing configuration present. When DevServices is enabled Quarkus will attempt to automatically configure and start a database when running in Dev or Test mode and when Docker is running. Environment variable: |
boolean |
|
The container image name to use, for container based DevServices providers. If you want to use Redis Stack modules (bloom, graph, search…), use: Environment variable: |
string |
|
Optional fixed port the dev service will listen to. If not defined, the port will be chosen randomly. Environment variable: |
int |
|
Indicates if the Redis server managed by Quarkus Dev Services is shared. When shared, Quarkus looks for running containers using label-based service discovery. If a matching container is found, it is used, and so a second one is not started. Otherwise, Dev Services for Redis starts a new container. The discovery uses the Container sharing is only used in dev mode. Environment variable: |
boolean |
|
The value of the This property is used when you need multiple shared Weaviate servers. Environment variable: |
string |
|
Environment variables that are passed to the container. Environment variable: |
Map<String,String> |
|
The Weaviate API key to authenticate with. Environment variable: |
string |
|
The scheme, e.g. "https" of cluster URL. Find it under Details of your Weaviate cluster. Environment variable: |
string |
|
The URL of the Weaviate server. Environment variable: |
string |
|
The gRPC port of the Weaviate server. Defaults to 8080 Environment variable: |
int |
|
The gRPC port of the Weaviate server. Defaults to 50051 Environment variable: |
int |
|
The gRPC connection is secured. Environment variable: |
boolean |
|
Use gRPC instead of http for batch inserts only. Will still be used for search. Environment variable: |
boolean |
|
The object class you want to store, e.g. "MyGreatClass". Must start from an uppercase letter. Environment variable: |
string |
|
The name of the field that contains the text of a Environment variable: |
string |
|
If true (default), then Environment variable: |
boolean |
|
Consistency level: ONE, QUORUM (default) or ALL. Environment variable: |
|
|
Metadata keys that should be persisted. The default in Weaviate [], however it is required to specify at least one for the EmbeddingStore to work. Thus, we use "tags" as default Environment variable: |
list of string |
|
The name of the field where Environment variable: |
string |
|