Neo4j Store for Retrieval Augmented Generation (RAG)

When implementing Retrieval Augmented Generation (RAG), a robust document store is crucial. This guide demonstrates how to leverage a Neo4j database as the document store.

Neo4j version 5.x or later is required (to support vector search).

Leveraging the Neo4j embedding store

To make use of the Neo4j embedding store, you’ll need to include the following dependency:

<dependency>
    <groupId>io.quarkiverse.langchain4j</groupId>
    <artifactId>quarkus-langchain4j-neo4j</artifactId>
</dependency>

The quarkus-lanchain4j-neo4j extension depends on another Quarkiverse extension, quarkus-neo4j, which provides the Neo4j client capabilities and also Dev Services support. All configuration from the quarkus-neo4j extension is thus applicable when using a Neo4j database as the document store. See quarkus-neo4j documentation for more information.

To get started, only one configuration property is required to be set - quarkus.langchain4j.neo4j.dimension, which specifies the dimension of the embeddings that you’re going to store and depends on the embedding model.

Configuration Settings

Customize the behavior of the extension by exploring various configuration options:

Configuration property fixed at build time - All other configuration properties are overridable at runtime

Configuration property

Type

Default

Dimension of the embeddings that will be stored in the Neo4j store.

Environment variable: QUARKUS_LANGCHAIN4J_NEO4J_DIMENSION

int

required

Label for the created nodes.

Environment variable: QUARKUS_LANGCHAIN4J_NEO4J_LABEL

string

Document

Name of the property to store the embedding vectors.

Environment variable: QUARKUS_LANGCHAIN4J_NEO4J_EMBEDDING_PROPERTY

string

embedding

Name of the property to store embedding IDs.

Environment variable: QUARKUS_LANGCHAIN4J_NEO4J_ID_PROPERTY

string

id

Prefix to be added to the metadata keys. By default, no prefix is used.

Environment variable: QUARKUS_LANGCHAIN4J_NEO4J_METADATA_PREFIX

string

Name of the property to store the embedding text.

Environment variable: QUARKUS_LANGCHAIN4J_NEO4J_TEXT_PROPERTY

string

text

Name of the index to be created for vector search.

Environment variable: QUARKUS_LANGCHAIN4J_NEO4J_INDEX_NAME

string

vector

Name of the database to connect to.

Environment variable: QUARKUS_LANGCHAIN4J_NEO4J_DATABASE_NAME

string

neo4j

The query to use when retrieving embeddings. This query has to return the following columns:

  • metadata

  • score

  • column of the same name as the 'id-property' value

  • column of the same name as the 'text-property' value

  • column of the same name as the 'embedding-property' value

Environment variable: QUARKUS_LANGCHAIN4J_NEO4J_RETRIEVAL_QUERY

string

RETURN properties(node) AS metadata, node.${quarkus.langchain4j.neo4j.id-property} AS ${quarkus.langchain4j.neo4j.id-property}, node.${quarkus.langchain4j.neo4j.text-property} AS ${quarkus.langchain4j.neo4j.text-property}, node.${quarkus.langchain4j.neo4j.embedding-property} AS ${quarkus.langchain4j.neo4j.embedding-property}, score