Jlama Embedding Models
Jlama provides local embedding models suitable for RAG (Retriever-Augmented Generation), semantic search, and document classification—all without leaving the Java process.
Prerequisites
Jlama embedding models require Java 21 or later with the Vector API preview feature enabled:
--enable-preview --enable-native-access=ALL-UNNAMED --add-modules jdk.incubator.vector
See Jlama Chat Models for Dev Mode details and model setup.
Using Jlama Embeddings
To enable embedding model support, include:
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-jlama</artifactId>
<version>1.0.2</version>
</dependency>
Default Model
By default, the embedding model is set to: intfloat/e5-small-v2
You can override the embedding model configuration:
quarkus.langchain4j.jlama.embedding-model.model-name=intfloat/e5-small-v2
Example of using both chat and embedding models:
quarkus.langchain4j.log-requests=true
quarkus.langchain4j.log-responses=true
quarkus.langchain4j.jlama.chat-model.model-name=tjake/granite-3.0-2b-instruct-JQ4
quarkus.langchain4j.jlama.embedding-model.model-name=intfloat/e5-small-v2
Programmatic Access
To inject the embedding model programmatically:
@Inject EmbeddingModel model;
This allows direct access for use in retrievers, RAG pipelines, or semantic search.
Configuration Reference
Configuration property fixed at build time - All other configuration properties are overridable at runtime
Configuration property |
Type |
Default |
---|---|---|
Determines whether the necessary Jlama models are downloaded and included in the jar at build time. Currently, this option is only valid for Environment variable: |
boolean |
|
Whether the model should be enabled Environment variable: |
boolean |
|
Whether the model should be enabled Environment variable: |
boolean |
|
Model name to use Environment variable: |
string |
|
Model name to use Environment variable: |
string |
|
Location on the file-system which serves as a cache for the models Environment variable: |
path |
|
What sampling temperature to use, between 0.0 and 1.0. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. It is generally recommended to set this or the Environment variable: |
double |
|
The maximum number of tokens to generate in the completion. The token count of your prompt plus Environment variable: |
int |
|
Whether to enable the integration. Set to Environment variable: |
boolean |
|
Whether Jlama should log requests Environment variable: |
boolean |
|
Whether Jlama client should log responses Environment variable: |
boolean |
|
Type |
Default |
|
Model name to use Environment variable: |
string |
|
Model name to use Environment variable: |
string |
|
What sampling temperature to use, between 0.0 and 1.0. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. It is generally recommended to set this or the Environment variable: |
double |
|
The maximum number of tokens to generate in the completion. The token count of your prompt plus Environment variable: |
int |
|
Whether to enable the integration. Set to Environment variable: |
boolean |
|
Whether Jlama should log requests Environment variable: |
boolean |
|
Whether Jlama client should log responses Environment variable: |
boolean |
|