Oracle Database Document Store
The Oracle extension allows you to use Oracle Database as a vector database for Retrieval-Augmented Generation (RAG) with Quarkus LangChain4j.
It leverages Oracle AI Vector Search, available in Oracle Database 23ai, to store and search vector embeddings using the native VECTOR data type.
Prerequisites
To use Oracle Database as a document store:
-
An Oracle Database 23ai (or later) instance is required, since AI Vector Search is only available from that version.
-
A Quarkus datasource must be configured.
|
Oracle AI Vector Search is built into the database engine and stores embeddings in a native |
In dev mode and test mode, the quarkus-langchain4j-oracle extension automatically starts an Oracle Database Free container (gvenzl/oracle-free:23-slim) via Dev Services. Oracle Express Edition (XE) is not supported, as it does not include AI Vector Search.
|
Dependency
To enable Oracle integration in your Quarkus project, add the following Maven dependency:
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-oracle</artifactId>
<version>1.10.0</version>
</dependency>
Even better, if you use the Quarkus platform BOM (default for projects generated), add the Quarkus Langchain4J BOM and all dependency versions will align:
<dependencyManagement>
<dependencies>
<dependency>
<groupId>${quarkus.platform.group-id}</groupId>
<artifactId>${quarkus.platform.artifact-id}</artifactId>
<version>${quarkus.platform.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<dependency>
<groupId>${quarkus.platform.group-id}</groupId>
<artifactId>quarkus-langchain4j-bom</artifactId> (1)
<version>${quarkus.platform.version}</version> (2)
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-oracle</artifactId>
(3)
</dependency>
</dependencies>
| 1 | In your dependencyManagement section, add the quarkus-langchain4j-bom |
| 2 | Inherit the version from your platform version |
| 3 | VoilĂ , no need for version alignment anymore |
This extension requires a configured Quarkus datasource. For configuration details, refer to the Quarkus DataSource Guide.
Embedding Table
The extension manages an embedding table whose creation is controlled by the create-option property:
quarkus.langchain4j.oracle.table=embeddings
quarkus.langchain4j.oracle.create-option=CREATE_IF_NOT_EXISTS
create-option accepts:
-
CREATE_NONE: the table must already exist. -
CREATE_IF_NOT_EXISTS: create the table only if it is missing (default). -
CREATE_OR_REPLACE: drop and recreate the table.
The column names can be customized if you need to map to an existing schema:
quarkus.langchain4j.oracle.id-column=id
quarkus.langchain4j.oracle.embedding-column=embedding
quarkus.langchain4j.oracle.text-column=text
quarkus.langchain4j.oracle.metadata-column=metadata
Vector Index
By default, searches run as an exact (brute-force) nearest neighbor scan, which is appropriate for small tables. For larger datasets, you can create an IVF (Inverted File) index to enable approximate nearest neighbor search:
quarkus.langchain4j.oracle.vector-index.create-option=CREATE_IF_NOT_EXISTS
quarkus.langchain4j.oracle.vector-index.target-accuracy=95
The remaining IVF parameters (degree-of-parallelism, neighbor-partitions, sample-per-partition, min-vectors-per-partition) are optional and fall back to the database defaults when not set.
|
The vector index is only created when |
To force exact search even when an index exists:
quarkus.langchain4j.oracle.exact-search=true
Metadata Indexes
Independently of the vector index, you can create JSON indexes on metadata keys to speed up filtering during search:
quarkus.langchain4j.oracle.metadata-indexes[0].create-option=CREATE_IF_NOT_EXISTS
quarkus.langchain4j.oracle.metadata-indexes[0].keys[0].key=category
quarkus.langchain4j.oracle.metadata-indexes[0].keys[0].type=STRING
quarkus.langchain4j.oracle.metadata-indexes[0].keys[0].order=ASC
This is useful for small tables where exact search is sufficient but metadata filtering still needs to be fast.
Usage Example
Once the extension is installed and configured, you can ingest documents into Oracle using the following code:
package io.quarkiverse.langchain4j.samples;
import static dev.langchain4j.data.document.splitter.DocumentSplitters.recursive;
import java.util.List;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
import dev.langchain4j.data.document.Document;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import io.quarkiverse.langchain4j.oracle.QuarkusOracleEmbeddingStore;
@ApplicationScoped
public class IngestorExampleWithOracle {
/**
* The embedding store (the database).
* The bean is provided by the quarkus-langchain4j-oracle extension.
*/
@Inject
QuarkusOracleEmbeddingStore store;
/**
* The embedding model (how is computed the vector of a document).
* The bean is provided by the LLM (like openai) extension.
*/
@Inject
EmbeddingModel embeddingModel;
public void ingest(List<Document> documents) {
EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
.embeddingStore(store)
.embeddingModel(embeddingModel)
.documentSplitter(recursive(500, 0))
.build();
// Warning - this can take a long time...
ingestor.ingest(documents);
}
}
This example shows how to embed and persist documents using the Oracle store, enabling similarity search during RAG queries.
Configuration
Customize the behavior of the extension using the following configuration options:
Configuration property fixed at build time - All other configuration properties are overridable at runtime
Configuration property |
Type |
Default |
|---|---|---|
Whether the default (unnamed) Oracle embedding store should be enabled. Set to Environment variable: |
boolean |
|
The name of the configured Oracle datasource to use for the default store. If not set, the default datasource from the Agroal extension will be used. Environment variable: |
string |
|
The table name for storing embeddings. Environment variable: |
string |
|
Whether to create the embedding table if it does not already exist, replace it, or do nothing.
Environment variable: |
|
|
Custom name for the id column. Defaults to Environment variable: |
string |
|
Custom name for the embedding column. Defaults to Environment variable: |
string |
|
Custom name for the text column. Defaults to Environment variable: |
string |
|
Custom name for the metadata column. Defaults to Environment variable: |
string |
|
Whether to use exact search (brute force) instead of approximate nearest neighbor search. Environment variable: |
boolean |
|
Type |
Default |
|
Whether this named Oracle embedding store should be enabled. Set to Environment variable: |
boolean |
|
The name of the configured Oracle datasource to use for this named store. If not set, the default datasource from the Agroal extension will be used. Environment variable: |
string |
|
The table name for storing embeddings. Environment variable: |
string |
|
Whether to create the embedding table if it does not already exist, replace it, or do nothing.
Environment variable: |
|
|
Custom name for the id column. Defaults to Environment variable: |
string |
|
Custom name for the embedding column. Defaults to Environment variable: |
string |
|
Custom name for the text column. Defaults to Environment variable: |
string |
|
Custom name for the metadata column. Defaults to Environment variable: |
string |
|
Whether to use exact search (brute force) instead of approximate nearest neighbor search. Environment variable: |
boolean |
|
Configuration for the IVF vector index used for approximate nearest neighbor search |
Type |
Default |
Whether to create the IVF vector index.
Environment variable: |
|
|
The target accuracy percentage (0-100) for the IVF vector index. Higher values improve recall at the cost of search latency. Environment variable: |
int |
|
The degree of parallelism for IVF vector index creation. Higher values speed up index creation on multi-core systems. Environment variable: |
int |
|
The number of neighbor partitions in the IVF index. This controls how the vector space is divided during index creation. Environment variable: |
int |
|
The number of samples per partition used when building the IVF index. Environment variable: |
int |
|
The minimum number of vectors per partition in the IVF index. Environment variable: |
int |
|
Type |
Default |
|
Whether this is a unique index. Environment variable: |
boolean |
|
Whether to create a bitmap index instead of a B-tree index. Bitmap indexes are more efficient for low-cardinality columns. Environment variable: |
boolean |
|
Whether to create the metadata index.
Environment variable: |
|
|
The JSON metadata key name to index. Environment variable: |
string |
required |
The SQL type of the indexed metadata key. Allowed values: Environment variable: |
string |
|
The sort order for this key in the index. Allowed values: Environment variable: |
string |
|
Configuration for the IVF vector index used for approximate nearest neighbor search |
Type |
Default |
Whether to create the IVF vector index.
Environment variable: |
|
|
The target accuracy percentage (0-100) for the IVF vector index. Higher values improve recall at the cost of search latency. Environment variable: |
int |
|
The degree of parallelism for IVF vector index creation. Higher values speed up index creation on multi-core systems. Environment variable: |
int |
|
The number of neighbor partitions in the IVF index. This controls how the vector space is divided during index creation. Environment variable: |
int |
|
The number of samples per partition used when building the IVF index. Environment variable: |
int |
|
The minimum number of vectors per partition in the IVF index. Environment variable: |
int |
|
Type |
Default |
|
Whether this is a unique index. Environment variable: |
boolean |
|
Whether to create a bitmap index instead of a B-tree index. Bitmap indexes are more efficient for low-cardinality columns. Environment variable: |
boolean |
|
Whether to create the metadata index.
Environment variable: |
|
|
The JSON metadata key name to index. Environment variable: |
string |
required |
The SQL type of the indexed metadata key. Allowed values: Environment variable: |
string |
|
The sort order for this key in the index. Allowed values: Environment variable: |
string |
|
How It Works
The Oracle extension maps each ingested document to a row in an Oracle table. Each row contains:
-
The original text content
-
Optional metadata, stored as JSON
-
The vector embedding, stored in a native
VECTOR(*, FLOAT32)column
During retrieval, a similarity search is performed using the native VECTOR_DISTANCE function, optionally accelerated by the IVF index and filtered by metadata when a filter is provided.
The extension manages schema and index creation automatically according to the configured create-option values.
Named Stores
You can configure multiple named Oracle stores, each backed by a different datasource. This is useful when your application needs to manage embeddings for different domains or tenants in separate databases.
To configure a named store, set its datasource and enable it at build time:
quarkus.langchain4j.oracle.products.datasource=products-ds
quarkus.langchain4j.oracle.products.table=product_embeddings
To inject a named store, use the @EmbeddingStoreName qualifier:
@Inject
@EmbeddingStoreName("products")
EmbeddingStore<TextSegment> productsStore;
The default store and named stores can coexist. If you only need named stores, disable the default store:
quarkus.langchain4j.oracle.default-store-enabled=false
Summary
To use Oracle Database as a document store with Quarkus LangChain4j:
-
Use an Oracle Database 23ai instance, which provides AI Vector Search.
-
Add the extension dependency.
-
Configure a datasource.
-
Optionally tune the embedding table, vector index, and metadata indexes.
-
Use
QuarkusOracleEmbeddingStore(or injectEmbeddingStore<TextSegment>) to ingest and retrieve embedded documents.