Ollama Chat Models
Ollama allows developers to run large language models (LLMs) locally on their machines, with support for both CPU and GPU execution. It supports many popular open models such as DeepSeek R1, Llama3, Mistral, and CodeLlama, which can be pulled from the Ollama model library.
Prerequisites
Install Ollama
Before using this extension, install Ollama locally. Visit the Ollama download page and follow the instructions for your platform.
Verify the installation with:
$ ollama --version
Dev Service
The Dev Service bundled with this extension simplifies local setup:
-
It will automatically pull any model configured in your application.
-
If Ollama is not already running, it will start an Ollama container using your installed container runtime (Podman or Docker).
-
If, running in a container, the container will be exposed via configuration properties:
langchain4j-ollama-dev-service.ollama.host=host
langchain4j-ollama-dev-service.ollama.port=port
langchain4j-ollama-dev-service.ollama.endpoint=http://${langchain4j-ollama-dev-service.ollama.host}:${langchain4j-ollama-dev-service.ollama.port}
Ollama models are large (e.g., Llama3 ~4.7 GB). Ensure sufficient disk space. |
Model pulls can take several minutes depending on the model size and connection speed. |
Using Ollama
To enable Ollama support in your Quarkus project, add the following dependency:
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-ollama</artifactId>
<version>1.0.2</version>
</dependency>
If no other LLM extension is present, AI Services will default to using the configured Ollama chat model.
Chat Model Configuration
By default, the model llama3.2
is used.
You can change the chat model using the following property:
quarkus.langchain4j.ollama.chat-model.model-name=mistral
Dynamic Authorization
If your Ollama endpoint requires authorization, you can implement ModelAuthProvider
:
import io.quarkiverse.langchain4j.auth.ModelAuthProvider;
import jakarta.enterprise.context.ApplicationScoped;
@ApplicationScoped
public class MyAuthProvider implements ModelAuthProvider {
@Override
public String getAuthorization(Input input) {
return "Bearer " + getTokenFromSomewhere();
}
}
Function calling support
Function calling is supported in Ollama since version 0.3.0. Not all models support function calling (tools). Refer to this list to find compatible ones.
Configuration Reference
Configuration property fixed at build time - All other configuration properties are overridable at runtime
Configuration property |
Type |
Default |
---|---|---|
Whether the model should be enabled Environment variable: |
boolean |
|
Whether the model should be enabled Environment variable: |
boolean |
|
If Dev Services for Ollama has been explicitly enabled or disabled. Dev Services are generally enabled by default, unless there is an existing configuration present. Environment variable: |
boolean |
|
The Ollama container image to use. Environment variable: |
string |
|
Model to use Environment variable: |
string |
|
Model to use. According to Ollama docs, the default value is Environment variable: |
string |
|
Base URL where the Ollama serving is running Environment variable: |
string |
|
If set, the named TLS configuration with the configured name will be applied to the REST Client Environment variable: |
string |
|
Timeout for Ollama calls Environment variable: |
|
|
Whether the Ollama client should log requests Environment variable: |
boolean |
|
Whether the Ollama client should log responses Environment variable: |
boolean |
|
Whether to enable the integration. Defaults to Environment variable: |
boolean |
|
The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively. Environment variable: |
double |
|
Maximum number of tokens to predict when generating text Environment variable: |
int |
|
Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return Environment variable: |
list of string |
|
Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text Environment variable: |
double |
|
Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative Environment variable: |
int |
|
With a static number the result is always the same. With a random number the result varies Example:
Environment variable: |
int |
|
The format to return a response in. Format can be Environment variable: |
string |
|
Whether chat model requests should be logged Environment variable: |
boolean |
|
Whether chat model responses should be logged Environment variable: |
boolean |
|
The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively. Environment variable: |
double |
|
Maximum number of tokens to predict when generating text Environment variable: |
int |
|
Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return Environment variable: |
list of string |
|
Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text Environment variable: |
double |
|
Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative Environment variable: |
int |
|
Whether embedding model requests should be logged Environment variable: |
boolean |
|
Whether embedding model responses should be logged Environment variable: |
boolean |
|
Type |
Default |
|
Model to use Environment variable: |
string |
|
Model to use. According to Ollama docs, the default value is Environment variable: |
string |
|
Base URL where the Ollama serving is running Environment variable: |
string |
|
If set, the named TLS configuration with the configured name will be applied to the REST Client Environment variable: |
string |
|
Timeout for Ollama calls Environment variable: |
|
|
Whether the Ollama client should log requests Environment variable: |
boolean |
|
Whether the Ollama client should log responses Environment variable: |
boolean |
|
Whether to enable the integration. Defaults to Environment variable: |
boolean |
|
The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively. Environment variable: |
double |
|
Maximum number of tokens to predict when generating text Environment variable: |
int |
|
Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return Environment variable: |
list of string |
|
Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text Environment variable: |
double |
|
Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative Environment variable: |
int |
|
With a static number the result is always the same. With a random number the result varies Example:
Environment variable: |
int |
|
The format to return a response in. Format can be Environment variable: |
string |
|
Whether chat model requests should be logged Environment variable: |
boolean |
|
Whether chat model responses should be logged Environment variable: |
boolean |
|
The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively. Environment variable: |
double |
|
Maximum number of tokens to predict when generating text Environment variable: |
int |
|
Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return Environment variable: |
list of string |
|
Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text Environment variable: |
double |
|
Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative Environment variable: |
int |
|
Whether embedding model requests should be logged Environment variable: |
boolean |
|
Whether embedding model responses should be logged Environment variable: |
boolean |
|
About the Duration format
To write duration values, use the standard You can also use a simplified format, starting with a number:
In other cases, the simplified format is translated to the
|