Pinecone Store for Retrieval Augmented Generation (RAG)

When implementing Retrieval Augmented Generation (RAG), a robust document store is crucial. This guide demonstrates how to leverage a Pinecone database as the document store.

Leveraging the Pinecone Document Store

To make use of the Pinecone document store, you’ll need to include the following dependency:

<dependency>
    <groupId>io.quarkiverse.langchain4j</groupId>
    <artifactId>quarkus-langchain4j-pinecone</artifactId>
</dependency>

Configuration Settings

Customize the behavior of the extension by exploring various configuration options:

Configuration property fixed at build time - All other configuration properties are overridable at runtime

Configuration property

Type

Default

The API key to Pinecone.

Environment variable: QUARKUS_LANGCHAIN4J_PINECONE_API_KEY

string

required

Environment name, e.g. gcp-starter or northamerica-northeast1-gcp.

Environment variable: QUARKUS_LANGCHAIN4J_PINECONE_ENVIRONMENT

string

required

ID of the project.

Environment variable: QUARKUS_LANGCHAIN4J_PINECONE_PROJECT_ID

string

required

Name of the index within the project. If the index doesn’t exist, it will be created.

Environment variable: QUARKUS_LANGCHAIN4J_PINECONE_INDEX_NAME

string

required

Dimension of the embeddings in the index. This is required only in case that the index doesn’t exist yet and needs to be created.

Environment variable: QUARKUS_LANGCHAIN4J_PINECONE_DIMENSION

int

The type of the pod to use. This is only used if the index doesn’t exist yet and needs to be created. The format: One of s1, p1, or p2 appended with . and one of x1, x2, x4, or x8.

Environment variable: QUARKUS_LANGCHAIN4J_PINECONE_POD_TYPE

string

s1.x1

The timeout duration for the index to become ready. Only relevant if the index doesn’t exist yet and needs to be created. If not specified, 1 minute will be used.

Environment variable: QUARKUS_LANGCHAIN4J_PINECONE_INDEX_READINESS_TIMEOUT

Duration

The namespace.

Environment variable: QUARKUS_LANGCHAIN4J_PINECONE_NAMESPACE

string

The name of the field that contains the text segment.

Environment variable: QUARKUS_LANGCHAIN4J_PINECONE_TEXT_FIELD_NAME

string

text

The timeout duration for the Pinecone client. If not specified, 5 seconds will be used.

Environment variable: QUARKUS_LANGCHAIN4J_PINECONE_TIMEOUT

Duration

About the Duration format

To write duration values, use the standard java.time.Duration format. See the Duration#parse() Java API documentation for more information.

You can also use a simplified format, starting with a number:

  • If the value is only a number, it represents time in seconds.

  • If the value is a number followed by ms, it represents time in milliseconds.

In other cases, the simplified format is translated to the java.time.Duration format for parsing:

  • If the value is a number followed by h, m, or s, it is prefixed with PT.

  • If the value is a number followed by d, it is prefixed with P.