Ollama Embedding Models
Ollama supports embedding models suitable for semantic search, document retrieval, and RAG-style workflows. These models run locally, just like chat models.
Prerequisites
Ollama Installation
To use embedding models, you must have a working Ollama setup. Refer to Ollama Chat Models for details on installation and Dev Service support.
Enabling Ollama
To enable embedding support, include the following extension:
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-ollama</artifactId>
<version>1.0.2</version>
</dependency>
Default Model
By default, the embedding model is set to nomic-embed-text
.
You can override this using:
quarkus.langchain4j.ollama.embedding-model.model-name=bge-m3
You may also wish to configure logging during development:
quarkus.langchain4j.log-requests=true
quarkus.langchain4j.log-responses=true
Programmatic Usage
You can inject the embedding model directly:
@Inject EmbeddingModel model;
This will retrieve the embedding model configured in application.properties
.
Dynamic Authorization
To provide dynamic authorization headers, implement ModelAuthProvider
:
import io.quarkiverse.langchain4j.auth.ModelAuthProvider;
import jakarta.enterprise.context.ApplicationScoped;
@ApplicationScoped
public class MyAuthProvider implements ModelAuthProvider {
@Override
public String getAuthorization(Input input) {
return "Bearer " + fetchToken();
}
}
Configuration Reference
Configuration property fixed at build time - All other configuration properties are overridable at runtime
Configuration property |
Type |
Default |
---|---|---|
Whether the model should be enabled Environment variable: |
boolean |
|
Whether the model should be enabled Environment variable: |
boolean |
|
If Dev Services for Ollama has been explicitly enabled or disabled. Dev Services are generally enabled by default, unless there is an existing configuration present. Environment variable: |
boolean |
|
The Ollama container image to use. Environment variable: |
string |
|
Model to use Environment variable: |
string |
|
Model to use. According to Ollama docs, the default value is Environment variable: |
string |
|
Base URL where the Ollama serving is running Environment variable: |
string |
|
If set, the named TLS configuration with the configured name will be applied to the REST Client Environment variable: |
string |
|
Timeout for Ollama calls Environment variable: |
|
|
Whether the Ollama client should log requests Environment variable: |
boolean |
|
Whether the Ollama client should log responses Environment variable: |
boolean |
|
Whether to enable the integration. Defaults to Environment variable: |
boolean |
|
The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively. Environment variable: |
double |
|
Maximum number of tokens to predict when generating text Environment variable: |
int |
|
Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return Environment variable: |
list of string |
|
Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text Environment variable: |
double |
|
Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative Environment variable: |
int |
|
With a static number the result is always the same. With a random number the result varies Example:
Environment variable: |
int |
|
The format to return a response in. Format can be Environment variable: |
string |
|
Whether chat model requests should be logged Environment variable: |
boolean |
|
Whether chat model responses should be logged Environment variable: |
boolean |
|
The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively. Environment variable: |
double |
|
Maximum number of tokens to predict when generating text Environment variable: |
int |
|
Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return Environment variable: |
list of string |
|
Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text Environment variable: |
double |
|
Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative Environment variable: |
int |
|
Whether embedding model requests should be logged Environment variable: |
boolean |
|
Whether embedding model responses should be logged Environment variable: |
boolean |
|
Type |
Default |
|
Model to use Environment variable: |
string |
|
Model to use. According to Ollama docs, the default value is Environment variable: |
string |
|
Base URL where the Ollama serving is running Environment variable: |
string |
|
If set, the named TLS configuration with the configured name will be applied to the REST Client Environment variable: |
string |
|
Timeout for Ollama calls Environment variable: |
|
|
Whether the Ollama client should log requests Environment variable: |
boolean |
|
Whether the Ollama client should log responses Environment variable: |
boolean |
|
Whether to enable the integration. Defaults to Environment variable: |
boolean |
|
The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively. Environment variable: |
double |
|
Maximum number of tokens to predict when generating text Environment variable: |
int |
|
Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return Environment variable: |
list of string |
|
Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text Environment variable: |
double |
|
Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative Environment variable: |
int |
|
With a static number the result is always the same. With a random number the result varies Example:
Environment variable: |
int |
|
The format to return a response in. Format can be Environment variable: |
string |
|
Whether chat model requests should be logged Environment variable: |
boolean |
|
Whether chat model responses should be logged Environment variable: |
boolean |
|
The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively. Environment variable: |
double |
|
Maximum number of tokens to predict when generating text Environment variable: |
int |
|
Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return Environment variable: |
list of string |
|
Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text Environment variable: |
double |
|
Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative Environment variable: |
int |
|
Whether embedding model requests should be logged Environment variable: |
boolean |
|
Whether embedding model responses should be logged Environment variable: |
boolean |
|
About the Duration format
To write duration values, use the standard You can also use a simplified format, starting with a number:
In other cases, the simplified format is translated to the
|