In-Process Embedding Models
In Retrieval-Augmented Generation (RAG) and other semantic search scenarios, embedding models are used to convert documents into vector representations. These vectors are stored in a vector database and used for similarity search.
By default, many providers (e.g., OpenAI, Hugging Face) expose remote APIs for embedding generation. However, this approach introduces network overhead and potential latency.
The in-process embedding model support in Quarkus LangChain4j runs the embedding model inside the application itself using ONNX. This yields lower latency and avoids external dependencies—but requires sufficient memory.
You can consult the MTEB leaderboard to select the most appropriate embedding model for your use case.
Supported Models
The following table lists the supported in-process models and the corresponding Maven dependency to include in your project. These dependencies are not included by default.
Model Name | Dependency | Vector Dimension | Injected Type |
---|---|---|---|
dev.langchain4j:langchain4j-embeddings-all-minilm-l6-v2-q:1.1.0 |
384 |
dev.langchain4j.model.embedding.onnx.allminilml6v2q.AllMiniLmL6V2QuantizedEmbeddingModel |
|
dev.langchain4j:langchain4j-embeddings-all-minilm-l6-v2:1.1.0 |
384 |
dev.langchain4j.model.embedding.onnx.allminilml6v2.AllMiniLmL6V2EmbeddingModel |
|
dev.langchain4j:langchain4j-embeddings-bge-small-en-v15-q:1.1.0 |
384 |
dev.langchain4j.model.embedding.onnx.bgesmallenv15q.BgeSmallEnV15QuantizedEmbeddingModel |
|
dev.langchain4j:langchain4j-embeddings-bge-small-en-v15:1.1.0 |
384 |
dev.langchain4j.model.embedding.onnx.bgesmallenv15.BgeSmallEnV15EmbeddingModel |
|
dev.langchain4j:langchain4j-embeddings-bge-small-en-q:1.1.0 |
384 |
dev.langchain4j.model.embedding.onnx.bgesmallenq.BgeSmallEnQuantizedEmbeddingModel |
|
dev.langchain4j:langchain4j-embeddings-bge-small-en:1.1.0 |
384 |
dev.langchain4j.model.embedding.onnx.bgesmallen.BgeSmallEnEmbeddingModel |
|
dev.langchain4j:langchain4j-embeddings-bge-small-zh-q:1.1.0 |
384 |
dev.langchain4j.model.embedding.onnx.bgesmallzhq.BgeSmallZhQuantizedEmbeddingModel |
|
dev.langchain4j:langchain4j-embeddings-bge-small-zh:1.1.0 |
384 |
dev.langchain4j.model.embedding.onnx.bgesmallzh.BgeSmallZhEmbeddingModel |
|
dev.langchain4j:langchain4j-embeddings-e5-small-v2-q:1.1.0 |
384 |
dev.langchain4j.model.embedding.onnx.e5smallv2q.E5SmallV2QuantizedEmbeddingModel |
|
dev.langchain4j:langchain4j-embeddings-e5-small-v2:1.1.0 |
384 |
dev.langchain4j.model.embedding.onnx.e5smallv2.E5SmallV2EmbeddingModel |
Using an In-Process Embedding Model
To use an in-process embedding model, add the corresponding dependency listed above to your pom.xml, then inject the appropriate model type:
@Inject
E5SmallV2QuantizedEmbeddingModel model;
Each model provides a concrete class to inject. Make sure you match the injected type with the dependency you’ve added.
Alternatively, if you don’t use any other embedding model provider in your application, you can inject the generic interface:
@Inject
EmbeddingModel model;
This allows for more flexibility and simplifies configuration in small applications.
In-process embedding models work well for local development, edge environments, and production use cases where low-latency and data privacy are critical. |