Neo4j Embedding Store
Neo4j is a graph database that also supports vector search starting from version 5.x. With Quarkus LangChain4j, you can use Neo4j as a vector-capable document store for implementing Retrieval-Augmented Generation (RAG) pipelines.
Neo4j 5.11 or later is required for native vector similarity search via the |
Dependency
To enable Neo4j vector store support, add the following dependency:
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-neo4j</artifactId>
<version>1.0.2</version>
</dependency>
This extension depends on the quarkus-neo4j
extension for Neo4j driver configuration, Dev Services support, and reactive client integration.
All standard |
Prerequisites
Ensure your Neo4j instance:
-
Is running version 5.11 or newer
-
Has the
gds
(Graph Data Science) plugin installed (if needed for advanced vector operations) -
Has schema configured for vector indexing (see below)
Embedding Dimension
You must define the dimensionality of your embedding vectors:
quarkus.langchain4j.neo4j.dimension=384
Typical dimensions:
-
AllMiniLmL6V2QuantizedEmbeddingModel
→ 384 -
OpenAI
text-embedding-ada-002
→ 1536
If the embedding dimension is missing or mismatched, ingestion and retrieval will fail or produce inaccurate results. If you switch to a different embedding model, ensure the |
Usage Example
Once installed and configured, you can use Neo4j to ingest and retrieve documents based on vector similarity:
package io.quarkiverse.langchain4j.samples;
import static dev.langchain4j.data.document.splitter.DocumentSplitters.recursive;
import java.util.List;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
import dev.langchain4j.community.store.embedding.neo4j.Neo4jEmbeddingStore;
import dev.langchain4j.data.document.Document;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
@ApplicationScoped
public class IngestorExampleWithNeo4J {
/**
* The embedding store (the database).
* The bean is provided by the quarkus-langchain4j-pgvector extension.
*/
@Inject
Neo4jEmbeddingStore store;
/**
* The embedding model (how is computed the vector of a document).
* The bean is provided by the LLM (like openai) extension.
*/
@Inject
EmbeddingModel embeddingModel;
public void ingest(List<Document> documents) {
EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
.embeddingStore(store)
.embeddingModel(embeddingModel)
.documentSplitter(recursive(500, 0))
.build();
// Warning - this can take a long time...
ingestor.ingest(documents);
}
}
Configuration
You can configure the Neo4j connection and vector store behavior using these properties:
Configuration property fixed at build time - All other configuration properties are overridable at runtime
Configuration property |
Type |
Default |
---|---|---|
Dimension of the embeddings that will be stored in the Neo4j store. Environment variable: |
int |
required |
Label for the created nodes. Environment variable: |
string |
|
Name of the property to store the embedding vectors. Environment variable: |
string |
|
Name of the property to store embedding IDs. Environment variable: |
string |
|
Prefix to be added to the metadata keys. By default, no prefix is used. Environment variable: |
string |
|
Name of the property to store the embedding text. Environment variable: |
string |
|
Name of the index to be created for vector search. Environment variable: |
string |
|
Name of the database to connect to. Environment variable: |
string |
|
The query to use when retrieving embeddings. This query has to return the following columns:
Environment variable: |
string |
|
Common settings include:
-
quarkus.langchain4j.neo4j.dimension
– Required; dimension of your embeddings -
quarkus.neo4j.uri
,quarkus.neo4j.authentication.*
– Standard Neo4j driver settings
How It Works
Internally, the extension maps each document into a Neo4j node with:
-
A
text
property (raw content) -
A
vector
property (embedding) -
Optional metadata stored as node properties
Documents are stored in a custom label (:Document
) and indexed using Neo4j’s vector search capabilities.
Vector queries are executed using the KNN
clause:
MATCH (d:Document)
RETURN d.text, gds.similarity.cosine(d.vector, $queryVector) AS score
ORDER BY score DESC
LIMIT 5
Schema Setup
The vector field is stored as a list of floats and must be indexed for similarity search:
CREATE VECTOR INDEX langchain_vector_index FOR (d:Document)
ON (d.vector) OPTIONS {indexConfig: {
`vector.dimensions`: 384,
`vector.similarity_function`: 'cosine'
}};
The extension will attempt to create this index automatically if it does not exist.
Metadata Filtering
Neo4j supports flexible filtering based on node properties. You can attach metadata as key-value pairs to each node and query with Cypher expressions.
Example: Restrict to documents from a given author and year:
MATCH (d:Document)
WHERE d.author = 'Alice' AND d.year = 2023
RETURN ...
Metadata filtering is fully supported using standard Cypher conditions alongside vector similarity. |
Summary
To use Neo4j as a vector store for RAG with Quarkus LangChain4j:
-
Ensure Neo4j 5.11+ with vector indexing is available
-
Add the
quarkus-langchain4j-neo4j
dependency -
Set the vector
dimension
and Neo4j connection parameters -
Use the
Neo4jEmbeddingStore
to ingest and retrieve documents -
Leverage Cypher filters and indexing for fine-grained control