Neo4j Embedding Store

Neo4j is a graph database that also supports vector search starting from version 5.x. With Quarkus LangChain4j, you can use Neo4j as a vector-capable document store for implementing Retrieval-Augmented Generation (RAG) pipelines.

Neo4j 5.11 or later is required for native vector similarity search via the KNN clause. Ensure your Neo4j deployment supports this feature.

Dependency

To enable Neo4j vector store support, add the following dependency:

<dependency>
    <groupId>io.quarkiverse.langchain4j</groupId>
    <artifactId>quarkus-langchain4j-neo4j</artifactId>
    <version>1.0.2</version>
</dependency>

This extension depends on the quarkus-neo4j extension for Neo4j driver configuration, Dev Services support, and reactive client integration.

All standard quarkus-neo4j configuration properties are available. For full documentation, refer to: https://docs.quarkiverse.io/quarkus-neo4j/dev/index.html

Prerequisites

Ensure your Neo4j instance:

  • Is running version 5.11 or newer

  • Has the gds (Graph Data Science) plugin installed (if needed for advanced vector operations)

  • Has schema configured for vector indexing (see below)

Embedding Dimension

You must define the dimensionality of your embedding vectors:

quarkus.langchain4j.neo4j.dimension=384

Typical dimensions:

  • AllMiniLmL6V2QuantizedEmbeddingModel → 384

  • OpenAI text-embedding-ada-002 → 1536

If the embedding dimension is missing or mismatched, ingestion and retrieval will fail or produce inaccurate results.

If you switch to a different embedding model, ensure the dimension value is updated accordingly.

Usage Example

Once installed and configured, you can use Neo4j to ingest and retrieve documents based on vector similarity:

package io.quarkiverse.langchain4j.samples;

import static dev.langchain4j.data.document.splitter.DocumentSplitters.recursive;

import java.util.List;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

import dev.langchain4j.community.store.embedding.neo4j.Neo4jEmbeddingStore;
import dev.langchain4j.data.document.Document;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;

@ApplicationScoped
public class IngestorExampleWithNeo4J {

    /**
     * The embedding store (the database).
     * The bean is provided by the quarkus-langchain4j-pgvector extension.
     */
    @Inject
    Neo4jEmbeddingStore store;

    /**
     * The embedding model (how is computed the vector of a document).
     * The bean is provided by the LLM (like openai) extension.
     */
    @Inject
    EmbeddingModel embeddingModel;

    public void ingest(List<Document> documents) {
        EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
                .embeddingStore(store)
                .embeddingModel(embeddingModel)
                .documentSplitter(recursive(500, 0))
                .build();
        // Warning - this can take a long time...
        ingestor.ingest(documents);
    }
}

Configuration

You can configure the Neo4j connection and vector store behavior using these properties:

Configuration property fixed at build time - All other configuration properties are overridable at runtime

Configuration property

Type

Default

Dimension of the embeddings that will be stored in the Neo4j store.

Environment variable: QUARKUS_LANGCHAIN4J_NEO4J_DIMENSION

int

required

Label for the created nodes.

Environment variable: QUARKUS_LANGCHAIN4J_NEO4J_LABEL

string

Document

Name of the property to store the embedding vectors.

Environment variable: QUARKUS_LANGCHAIN4J_NEO4J_EMBEDDING_PROPERTY

string

embedding

Name of the property to store embedding IDs.

Environment variable: QUARKUS_LANGCHAIN4J_NEO4J_ID_PROPERTY

string

id

Prefix to be added to the metadata keys. By default, no prefix is used.

Environment variable: QUARKUS_LANGCHAIN4J_NEO4J_METADATA_PREFIX

string

Name of the property to store the embedding text.

Environment variable: QUARKUS_LANGCHAIN4J_NEO4J_TEXT_PROPERTY

string

text

Name of the index to be created for vector search.

Environment variable: QUARKUS_LANGCHAIN4J_NEO4J_INDEX_NAME

string

vector

Name of the database to connect to.

Environment variable: QUARKUS_LANGCHAIN4J_NEO4J_DATABASE_NAME

string

neo4j

The query to use when retrieving embeddings. This query has to return the following columns:

  • metadata

  • score

  • column of the same name as the 'id-property' value

  • column of the same name as the 'text-property' value

  • column of the same name as the 'embedding-property' value

Environment variable: QUARKUS_LANGCHAIN4J_NEO4J_RETRIEVAL_QUERY

string

RETURN properties(node) AS metadata, node.${quarkus.langchain4j.neo4j.id-property} AS ${quarkus.langchain4j.neo4j.id-property}, node.${quarkus.langchain4j.neo4j.text-property} AS ${quarkus.langchain4j.neo4j.text-property}, node.${quarkus.langchain4j.neo4j.embedding-property} AS ${quarkus.langchain4j.neo4j.embedding-property}, score

Common settings include:

  • quarkus.langchain4j.neo4j.dimension – Required; dimension of your embeddings

  • quarkus.neo4j.uri, quarkus.neo4j.authentication.* – Standard Neo4j driver settings

How It Works

Internally, the extension maps each document into a Neo4j node with:

  • A text property (raw content)

  • A vector property (embedding)

  • Optional metadata stored as node properties

Documents are stored in a custom label (:Document) and indexed using Neo4j’s vector search capabilities.

Vector queries are executed using the KNN clause:

MATCH (d:Document)
RETURN d.text, gds.similarity.cosine(d.vector, $queryVector) AS score
ORDER BY score DESC
LIMIT 5

Schema Setup

The vector field is stored as a list of floats and must be indexed for similarity search:

CREATE VECTOR INDEX langchain_vector_index FOR (d:Document)
ON (d.vector) OPTIONS {indexConfig: {
  `vector.dimensions`: 384,
  `vector.similarity_function`: 'cosine'
}};

The extension will attempt to create this index automatically if it does not exist.

Metadata Filtering

Neo4j supports flexible filtering based on node properties. You can attach metadata as key-value pairs to each node and query with Cypher expressions.

Example: Restrict to documents from a given author and year:

MATCH (d:Document)
WHERE d.author = 'Alice' AND d.year = 2023
RETURN ...
Metadata filtering is fully supported using standard Cypher conditions alongside vector similarity.

Summary

To use Neo4j as a vector store for RAG with Quarkus LangChain4j:

  1. Ensure Neo4j 5.11+ with vector indexing is available

  2. Add the quarkus-langchain4j-neo4j dependency

  3. Set the vector dimension and Neo4j connection parameters

  4. Use the Neo4jEmbeddingStore to ingest and retrieve documents

  5. Leverage Cypher filters and indexing for fine-grained control