In-Process Embedding Models

In Retrieval-Augmented Generation (RAG) and other semantic search scenarios, embedding models are used to convert documents into vector representations. These vectors are stored in a vector database and used for similarity search.

By default, many providers (e.g., OpenAI, Hugging Face) expose remote APIs for embedding generation. However, this approach introduces network overhead and potential latency.

The in-process embedding model support in Quarkus LangChain4j runs the embedding model inside the application itself using ONNX. This yields lower latency and avoids external dependencies—but requires sufficient memory.

You can consult the MTEB leaderboard to select the most appropriate embedding model for your use case.

Supported Models

The following table lists the supported in-process models and the corresponding Maven dependency to include in your project. These dependencies are not included by default.

Model Name Dependency Vector Dimension Injected Type

all-MiniLM-L6-v2 (quantized)

dev.langchain4j:langchain4j-embeddings-all-minilm-l6-v2-q:1.1.0

384

dev.langchain4j.model.embedding.onnx.allminilml6v2q.AllMiniLmL6V2QuantizedEmbeddingModel

all-MiniLM-L6-v2

dev.langchain4j:langchain4j-embeddings-all-minilm-l6-v2:1.1.0

384

dev.langchain4j.model.embedding.onnx.allminilml6v2.AllMiniLmL6V2EmbeddingModel

bge-small-en-v1.5 (quantized)

dev.langchain4j:langchain4j-embeddings-bge-small-en-v15-q:1.1.0

384

dev.langchain4j.model.embedding.onnx.bgesmallenv15q.BgeSmallEnV15QuantizedEmbeddingModel

bge-small-en-v1.5

dev.langchain4j:langchain4j-embeddings-bge-small-en-v15:1.1.0

384

dev.langchain4j.model.embedding.onnx.bgesmallenv15.BgeSmallEnV15EmbeddingModel

bge-small-en (quantized)

dev.langchain4j:langchain4j-embeddings-bge-small-en-q:1.1.0

384

dev.langchain4j.model.embedding.onnx.bgesmallenq.BgeSmallEnQuantizedEmbeddingModel

bge-small-en

dev.langchain4j:langchain4j-embeddings-bge-small-en:1.1.0

384

dev.langchain4j.model.embedding.onnx.bgesmallen.BgeSmallEnEmbeddingModel

bge-small-zh (quantized)

dev.langchain4j:langchain4j-embeddings-bge-small-zh-q:1.1.0

384

dev.langchain4j.model.embedding.onnx.bgesmallzhq.BgeSmallZhQuantizedEmbeddingModel

bge-small-zh

dev.langchain4j:langchain4j-embeddings-bge-small-zh:1.1.0

384

dev.langchain4j.model.embedding.onnx.bgesmallzh.BgeSmallZhEmbeddingModel

e5-small-v2 (quantized)

dev.langchain4j:langchain4j-embeddings-e5-small-v2-q:1.1.0

384

dev.langchain4j.model.embedding.onnx.e5smallv2q.E5SmallV2QuantizedEmbeddingModel

e5-small-v2

dev.langchain4j:langchain4j-embeddings-e5-small-v2:1.1.0

384

dev.langchain4j.model.embedding.onnx.e5smallv2.E5SmallV2EmbeddingModel

Using an In-Process Embedding Model

To use an in-process embedding model, add the corresponding dependency listed above to your pom.xml, then inject the appropriate model type:

@Inject
E5SmallV2QuantizedEmbeddingModel model;

Each model provides a concrete class to inject. Make sure you match the injected type with the dependency you’ve added.

Alternatively, if you don’t use any other embedding model provider in your application, you can inject the generic interface:

@Inject
EmbeddingModel model;

This allows for more flexibility and simplifies configuration in small applications.

In-process embedding models work well for local development, edge environments, and production use cases where low-latency and data privacy are critical.