Pinecone Vector Store
Pinecone is a fully managed, scalable vector database optimized for similarity search. With Quarkus LangChain4j, you can use Pinecone as a vector store to implement Retrieval-Augmented Generation (RAG) pipelines.
This guide explains how to configure and use Pinecone as a document store for embedded vectors.
Prerequisites
To use Pinecone, you need:
-
A Pinecone account and an active API key
-
A Pinecone index with a configured dimension matching your embedding model
-
The Pinecone index must support the same vector similarity metric as your use case (e.g., cosine)
For more details, visit: https://docs.pinecone.io/docs/quickstart
Dependency
Add the following dependency to your pom.xml:
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-pinecone</artifactId>
<version>1.10.0</version>
</dependency>
Even better, if you use the Quarkus platform BOM (default for projects generated), add the Quarkus Langchain4J BOM and all dependency versions will align:
<dependencyManagement>
<dependencies>
<dependency>
<groupId>${quarkus.platform.group-id}</groupId>
<artifactId>${quarkus.platform.artifact-id}</artifactId>
<version>${quarkus.platform.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<dependency>
<groupId>${quarkus.platform.group-id}</groupId>
<artifactId>quarkus-langchain4j-bom</artifactId> (1)
<version>${quarkus.platform.version}</version> (2)
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-pinecone</artifactId>
(3)
</dependency>
</dependencies>
| 1 | In your dependencyManagement section, add the quarkus-langchain4j-bom |
| 2 | Inherit the version from your platform version |
| 3 | Voilà, no need for version alignment anymore |
Configuration
You must configure your Pinecone API key, environment, index name, and embedding dimension in application.properties.
quarkus.langchain4j.pinecone.api-key=your-api-key
quarkus.langchain4j.pinecone.environment=us-west1-gcp
quarkus.langchain4j.pinecone.project-id=your-project-id
quarkus.langchain4j.pinecone.index-name=my-index
quarkus.langchain4j.pinecone.dimension=1536
See below for full configuration options:
Configuration property fixed at build time - All other configuration properties are overridable at runtime
Configuration property |
Type |
Default |
|---|---|---|
Whether the default (unnamed) Pinecone embedding store should be enabled. Set to Environment variable: |
boolean |
|
The API key to Pinecone. Environment variable: |
string |
|
Environment name, e.g. gcp-starter or northamerica-northeast1-gcp. Environment variable: |
string |
|
ID of the project. Environment variable: |
string |
|
Name of the index within the project. If the index doesn’t exist, it will be created. Environment variable: |
string |
|
Dimension of the embeddings in the index. This is required only in case that the index doesn’t exist yet and needs to be created. Environment variable: |
int |
|
The type of the pod to use. This is only used if the index doesn’t exist yet and needs to be created. The format: One of Environment variable: |
string |
|
The timeout duration for the index to become ready. Only relevant if the index doesn’t exist yet and needs to be created. If not specified, 1 minute will be used. Environment variable: |
||
The namespace. Environment variable: |
string |
|
The name of the field that contains the text segment. Environment variable: |
string |
|
The timeout duration for the Pinecone client. If not specified, 5 seconds will be used. Environment variable: |
||
Type |
Default |
|
The index name for this named store. This property serves as the build-time key that enables named store discovery. If not set, the index name from the runtime configuration will be used. Environment variable: |
string |
|
The API key to Pinecone. Environment variable: |
string |
|
Environment name, e.g. gcp-starter or northamerica-northeast1-gcp. Environment variable: |
string |
|
ID of the project. Environment variable: |
string |
|
Name of the index within the project. If the index doesn’t exist, it will be created. Environment variable: |
string |
|
Dimension of the embeddings in the index. This is required only in case that the index doesn’t exist yet and needs to be created. Environment variable: |
int |
|
The type of the pod to use. This is only used if the index doesn’t exist yet and needs to be created. The format: One of Environment variable: |
string |
|
The timeout duration for the index to become ready. Only relevant if the index doesn’t exist yet and needs to be created. If not specified, 1 minute will be used. Environment variable: |
||
The namespace. Environment variable: |
string |
|
The name of the field that contains the text segment. Environment variable: |
string |
|
The timeout duration for the Pinecone client. If not specified, 5 seconds will be used. Environment variable: |
|
About the Duration format
To write duration values, use the standard You can also use a simplified format, starting with a number:
In other cases, the simplified format is translated to the
|
Embedding Dimension
Make sure the configured dimension matches the embedding model you’re using:
-
OpenAI
text-embedding-ada-002→ 1536 -
AllMiniLmL6V2QuantizedEmbeddingModel→ 384
If the dimension mismatches the index configuration, insertion and querying will fail.
|
Your Pinecone index must be created with the correct vector dimension ahead of time. Quarkus will not automatically provision or reconfigure indexes. |
Usage Example
Once installed and configured, you can use the Pinecone vector store to ingest and retrieve embedded documents:
package io.quarkiverse.langchain4j.samples;
import static dev.langchain4j.data.document.splitter.DocumentSplitters.recursive;
import java.util.List;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
import dev.langchain4j.data.document.Document;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import io.quarkiverse.langchain4j.pinecone.PineconeEmbeddingStore;
@ApplicationScoped
public class IngestorExampleWithPinecone {
/**
* The embedding store (the database).
* The bean is provided by the quarkus-langchain4j-pinecone extension.
*/
@Inject
PineconeEmbeddingStore store;
/**
* The embedding model (how is computed the vector of a document).
* The bean is provided by the LLM (like openai) extension.
*/
@Inject
EmbeddingModel embeddingModel;
public void ingest(List<Document> documents) {
EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
.embeddingStore(store)
.embeddingModel(embeddingModel)
.documentSplitter(recursive(500, 0))
.build();
// Warning - this can take a long time...
ingestor.ingest(documents);
}
}
This example demonstrates how to store text segments along with embeddings and metadata in Pinecone.
How It Works
The Pinecone store integration works by:
-
Converting input text into embedding vectors using your configured
EmbeddingModel -
Storing each vector with associated metadata and a unique ID
-
Executing similarity queries using Pinecone’s top-k vector search
-
Returning the most relevant matches for inclusion in a RAG prompt
Internally, the extension uses Pinecone’s REST API (via the MicroProfile REST Client) to:
-
Upsert vectors (
/vectors/upsert) -
Query vectors (
/query) -
Fetch metadata for matched entries
Named Stores
You can configure multiple named Pinecone stores, each with its own API key, environment, project, and index. This is useful when your application needs to manage embeddings for different domains or tenants in separate Pinecone indexes.
To configure a named store:
quarkus.langchain4j.pinecone.products.index-name=products-index
quarkus.langchain4j.pinecone.products.api-key=${PINECONE_API_KEY}
quarkus.langchain4j.pinecone.products.environment=gcp-starter
quarkus.langchain4j.pinecone.products.project-id=abc123
quarkus.langchain4j.pinecone.products.dimension=1536
quarkus.langchain4j.pinecone.products.namespace=product-ns
|
The |
To inject a named store, use the @EmbeddingStoreName qualifier:
@Inject
@EmbeddingStoreName("products")
EmbeddingStore<TextSegment> productsStore;
The default store and named stores can coexist. If you only need named stores, disable the default store:
quarkus.langchain4j.pinecone.default-store-enabled=false
Summary
To use Pinecone as a vector store for RAG with Quarkus LangChain4j:
-
Create a Pinecone index with the correct vector dimension
-
Add the
quarkus-langchain4j-pineconedependency -
Configure API credentials, environment, and index parameters
-
Use the
PineconeEmbeddingStoreto ingest and retrieve content