Ollama Embedding Models

Ollama supports embedding models suitable for semantic search, document retrieval, and RAG-style workflows. These models run locally, just like chat models.

Prerequisites

Ollama Installation

To use embedding models, you must have a working Ollama setup. Refer to Ollama Chat Models for details on installation and Dev Service support.

Enabling Ollama

To enable embedding support, include the following extension:

<dependency>
    <groupId>io.quarkiverse.langchain4j</groupId>
    <artifactId>quarkus-langchain4j-ollama</artifactId>
    <version>1.4.2</version>
</dependency>

Default Model

By default, the embedding model is set to nomic-embed-text.

You can override this using:

quarkus.langchain4j.ollama.embedding-model.model-name=bge-m3

You may also wish to configure logging during development:

quarkus.langchain4j.log-requests=true
quarkus.langchain4j.log-responses=true

Programmatic Usage

You can inject the embedding model directly:

@Inject EmbeddingModel model;

This will retrieve the embedding model configured in application.properties.

Dynamic Authorization

To provide dynamic authorization headers, implement ModelAuthProvider:

import io.quarkiverse.langchain4j.auth.ModelAuthProvider;
import jakarta.enterprise.context.ApplicationScoped;

@ApplicationScoped
public class MyAuthProvider implements ModelAuthProvider {

    @Override
    public String getAuthorization(Input input) {
        return "Bearer " + fetchToken();
    }
}

Configuration Reference

Configuration property fixed at build time - All other configuration properties are overridable at runtime

Configuration property	Type	Default
`quarkus.langchain4j.ollama.chat-model.enabled` Whether the model should be enabled Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_ENABLED`	boolean	`true`
`quarkus.langchain4j.ollama.embedding-model.enabled` Whether the model should be enabled Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_ENABLED`	boolean	`true`
`quarkus.langchain4j.ollama.devservices.enabled` If Dev Services for Ollama has been explicitly enabled or disabled. Dev Services are generally enabled by default, unless there is an existing configuration present. Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_DEVSERVICES_ENABLED`	boolean	`true`
`quarkus.langchain4j.ollama.devservices.image-name` The Ollama container image to use. Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_DEVSERVICES_IMAGE_NAME`	string	`ollama/ollama:latest`
`quarkus.langchain4j.ollama.chat-model.model-id` Model to use Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_MODEL_ID`	string	`llama3.2`
`quarkus.langchain4j.ollama.embedding-model.model-id` Model to use. According to Ollama docs, the default value is `nomic-embed-text` Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_MODEL_ID`	string	`nomic-embed-text`
`quarkus.langchain4j.ollama.base-url` Base URL where the Ollama serving is running Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_BASE_URL`	string
`quarkus.langchain4j.ollama.tls-configuration-name` If set, the named TLS configuration with the configured name will be applied to the REST Client Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_TLS_CONFIGURATION_NAME`	string
`quarkus.langchain4j.ollama.timeout` Timeout for Ollama calls Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_TIMEOUT`	Duration	`10s`
`quarkus.langchain4j.ollama.log-requests` Whether the Ollama client should log requests Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.ollama.log-responses` Whether the Ollama client should log responses Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.ollama.enable-integration` Whether to enable the integration. Defaults to `true`, which means requests are made to the OpenAI provider. Set to `false` to disable all requests. Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_ENABLE_INTEGRATION`	boolean	`true`
`quarkus.langchain4j.ollama.chat-model.temperature` The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively. Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_TEMPERATURE`	double	`${quarkus.langchain4j.temperature:0.8}`
`quarkus.langchain4j.ollama.chat-model.num-predict` Maximum number of tokens to predict when generating text Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_NUM_PREDICT`	int
`quarkus.langchain4j.ollama.chat-model.stop` Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_STOP`	list of string
`quarkus.langchain4j.ollama.chat-model.top-p` Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_TOP_P`	double	`0.9`
`quarkus.langchain4j.ollama.chat-model.top-k` Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_TOP_K`	int	`40`
`quarkus.langchain4j.ollama.chat-model.seed` With a static number the result is always the same. With a random number the result varies Example: `Random random = new Random(); int x = random.nextInt(Integer.MAX_VALUE);` Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_SEED`	int
`quarkus.langchain4j.ollama.chat-model.format` The format to return a response in. Format can be `json` or a JSON schema. Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_FORMAT`	string
`quarkus.langchain4j.ollama.chat-model.log-requests` Whether chat model requests should be logged Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.ollama.chat-model.log-responses` Whether chat model responses should be logged Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.ollama.embedding-model.temperature` The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively. Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_TEMPERATURE`	double	`${quarkus.langchain4j.temperature:0.8}`
`quarkus.langchain4j.ollama.embedding-model.num-predict` Maximum number of tokens to predict when generating text Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_NUM_PREDICT`	int	`128`
`quarkus.langchain4j.ollama.embedding-model.stop` Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_STOP`	list of string
`quarkus.langchain4j.ollama.embedding-model.top-p` Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_TOP_P`	double	`0.9`
`quarkus.langchain4j.ollama.embedding-model.top-k` Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_TOP_K`	int	`40`
`quarkus.langchain4j.ollama.embedding-model.log-requests` Whether embedding model requests should be logged Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.ollama.embedding-model.log-responses` Whether embedding model responses should be logged Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_LOG_RESPONSES`	boolean	`false`
Named model config	Type	Default
`quarkus.langchain4j.ollama."model-name".chat-model.model-id` Model to use Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_MODEL_ID`	string	`llama3.2`
`quarkus.langchain4j.ollama."model-name".embedding-model.model-id` Model to use. According to Ollama docs, the default value is `nomic-embed-text` Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_MODEL_ID`	string	`nomic-embed-text`
`quarkus.langchain4j.ollama."model-name".base-url` Base URL where the Ollama serving is running Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__BASE_URL`	string
`quarkus.langchain4j.ollama."model-name".tls-configuration-name` If set, the named TLS configuration with the configured name will be applied to the REST Client Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__TLS_CONFIGURATION_NAME`	string
`quarkus.langchain4j.ollama."model-name".timeout` Timeout for Ollama calls Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__TIMEOUT`	Duration	`10s`
`quarkus.langchain4j.ollama."model-name".log-requests` Whether the Ollama client should log requests Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.ollama."model-name".log-responses` Whether the Ollama client should log responses Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.ollama."model-name".enable-integration` Whether to enable the integration. Defaults to `true`, which means requests are made to the OpenAI provider. Set to `false` to disable all requests. Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__ENABLE_INTEGRATION`	boolean	`true`
`quarkus.langchain4j.ollama."model-name".chat-model.temperature` The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively. Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_TEMPERATURE`	double	`${quarkus.langchain4j.temperature:0.8}`
`quarkus.langchain4j.ollama."model-name".chat-model.num-predict` Maximum number of tokens to predict when generating text Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_NUM_PREDICT`	int
`quarkus.langchain4j.ollama."model-name".chat-model.stop` Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_STOP`	list of string
`quarkus.langchain4j.ollama."model-name".chat-model.top-p` Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_TOP_P`	double	`0.9`
`quarkus.langchain4j.ollama."model-name".chat-model.top-k` Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_TOP_K`	int	`40`
`quarkus.langchain4j.ollama."model-name".chat-model.seed` With a static number the result is always the same. With a random number the result varies Example: `Random random = new Random(); int x = random.nextInt(Integer.MAX_VALUE);` Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_SEED`	int
`quarkus.langchain4j.ollama."model-name".chat-model.format` The format to return a response in. Format can be `json` or a JSON schema. Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_FORMAT`	string
`quarkus.langchain4j.ollama."model-name".chat-model.log-requests` Whether chat model requests should be logged Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.ollama."model-name".chat-model.log-responses` Whether chat model responses should be logged Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.ollama."model-name".embedding-model.temperature` The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively. Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_TEMPERATURE`	double	`${quarkus.langchain4j.temperature:0.8}`
`quarkus.langchain4j.ollama."model-name".embedding-model.num-predict` Maximum number of tokens to predict when generating text Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_NUM_PREDICT`	int	`128`
`quarkus.langchain4j.ollama."model-name".embedding-model.stop` Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_STOP`	list of string
`quarkus.langchain4j.ollama."model-name".embedding-model.top-p` Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_TOP_P`	double	`0.9`
`quarkus.langchain4j.ollama."model-name".embedding-model.top-k` Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_TOP_K`	int	`40`
`quarkus.langchain4j.ollama."model-name".embedding-model.log-requests` Whether embedding model requests should be logged Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.ollama."model-name".embedding-model.log-responses` Whether embedding model responses should be logged Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_LOG_RESPONSES`	boolean	`false`

Configuration property

Type

Default

quarkus.langchain4j.ollama.chat-model.enabled

Whether the model should be enabled

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_ENABLED

boolean

true

quarkus.langchain4j.ollama.embedding-model.enabled

Whether the model should be enabled

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_ENABLED

boolean

true

quarkus.langchain4j.ollama.devservices.enabled

If Dev Services for Ollama has been explicitly enabled or disabled. Dev Services are generally enabled by default, unless there is an existing configuration present.

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_DEVSERVICES_ENABLED

boolean

true

quarkus.langchain4j.ollama.devservices.image-name

The Ollama container image to use.

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_DEVSERVICES_IMAGE_NAME

string

ollama/ollama:latest

quarkus.langchain4j.ollama.chat-model.model-id

Model to use

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_MODEL_ID

string

llama3.2

quarkus.langchain4j.ollama.embedding-model.model-id

Model to use. According to Ollama docs, the default value is nomic-embed-text

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_MODEL_ID

string

nomic-embed-text

quarkus.langchain4j.ollama.base-url

Base URL where the Ollama serving is running

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_BASE_URL

string

quarkus.langchain4j.ollama.tls-configuration-name

If set, the named TLS configuration with the configured name will be applied to the REST Client

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_TLS_CONFIGURATION_NAME

string

quarkus.langchain4j.ollama.timeout

Timeout for Ollama calls

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_TIMEOUT

Duration

10s

quarkus.langchain4j.ollama.log-requests

Whether the Ollama client should log requests

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_LOG_REQUESTS

boolean

false

quarkus.langchain4j.ollama.log-responses

Whether the Ollama client should log responses

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_LOG_RESPONSES

boolean

false

quarkus.langchain4j.ollama.enable-integration

Whether to enable the integration. Defaults to true, which means requests are made to the OpenAI provider. Set to false to disable all requests.

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_ENABLE_INTEGRATION

boolean

true

quarkus.langchain4j.ollama.chat-model.temperature

The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively.

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_TEMPERATURE

double

${quarkus.langchain4j.temperature:0.8}

quarkus.langchain4j.ollama.chat-model.num-predict

Maximum number of tokens to predict when generating text

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_NUM_PREDICT

int

quarkus.langchain4j.ollama.chat-model.stop

Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_STOP

list of string

quarkus.langchain4j.ollama.chat-model.top-p

Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_TOP_P

double

0.9

quarkus.langchain4j.ollama.chat-model.top-k

Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_TOP_K

int

40

quarkus.langchain4j.ollama.chat-model.seed

With a static number the result is always the same. With a random number the result varies Example:

`Random random = new Random();
int x = random.nextInt(Integer.MAX_VALUE);`

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_SEED

int

quarkus.langchain4j.ollama.chat-model.format

The format to return a response in. Format can be json or a JSON schema.

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_FORMAT

string

quarkus.langchain4j.ollama.chat-model.log-requests

Whether chat model requests should be logged

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_LOG_REQUESTS

boolean

false

quarkus.langchain4j.ollama.chat-model.log-responses

Whether chat model responses should be logged

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_LOG_RESPONSES

boolean

false

quarkus.langchain4j.ollama.embedding-model.temperature

The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively.

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_TEMPERATURE

double

${quarkus.langchain4j.temperature:0.8}

quarkus.langchain4j.ollama.embedding-model.num-predict

Maximum number of tokens to predict when generating text

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_NUM_PREDICT

int

128

quarkus.langchain4j.ollama.embedding-model.stop

Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_STOP

list of string

quarkus.langchain4j.ollama.embedding-model.top-p

Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_TOP_P

double

0.9

quarkus.langchain4j.ollama.embedding-model.top-k

Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_TOP_K

int

40

quarkus.langchain4j.ollama.embedding-model.log-requests

Whether embedding model requests should be logged

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_LOG_REQUESTS

boolean

false

quarkus.langchain4j.ollama.embedding-model.log-responses

Whether embedding model responses should be logged

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_LOG_RESPONSES

boolean

false

Named model config

Type

Default

quarkus.langchain4j.ollama."model-name".chat-model.model-id

Model to use

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_MODEL_ID

string

llama3.2

quarkus.langchain4j.ollama."model-name".embedding-model.model-id

Model to use. According to Ollama docs, the default value is nomic-embed-text

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_MODEL_ID

string

nomic-embed-text

quarkus.langchain4j.ollama."model-name".base-url

Base URL where the Ollama serving is running

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__BASE_URL

string

quarkus.langchain4j.ollama."model-name".tls-configuration-name

If set, the named TLS configuration with the configured name will be applied to the REST Client

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__TLS_CONFIGURATION_NAME

string

quarkus.langchain4j.ollama."model-name".timeout

Timeout for Ollama calls

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__TIMEOUT

Duration

10s

quarkus.langchain4j.ollama."model-name".log-requests

Whether the Ollama client should log requests

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__LOG_REQUESTS

boolean

false

quarkus.langchain4j.ollama."model-name".log-responses

Whether the Ollama client should log responses

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__LOG_RESPONSES

boolean

false

quarkus.langchain4j.ollama."model-name".enable-integration

Whether to enable the integration. Defaults to true, which means requests are made to the OpenAI provider. Set to false to disable all requests.

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__ENABLE_INTEGRATION

boolean

true

quarkus.langchain4j.ollama."model-name".chat-model.temperature

The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively.

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_TEMPERATURE

double

${quarkus.langchain4j.temperature:0.8}

quarkus.langchain4j.ollama."model-name".chat-model.num-predict

Maximum number of tokens to predict when generating text

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_NUM_PREDICT

int

quarkus.langchain4j.ollama."model-name".chat-model.stop

Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_STOP

list of string

quarkus.langchain4j.ollama."model-name".chat-model.top-p

Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_TOP_P

double

0.9

quarkus.langchain4j.ollama."model-name".chat-model.top-k

Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_TOP_K

int

40

quarkus.langchain4j.ollama."model-name".chat-model.seed

With a static number the result is always the same. With a random number the result varies Example:

`Random random = new Random();
int x = random.nextInt(Integer.MAX_VALUE);`

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_SEED

int

quarkus.langchain4j.ollama."model-name".chat-model.format

The format to return a response in. Format can be json or a JSON schema.

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_FORMAT

string

quarkus.langchain4j.ollama."model-name".chat-model.log-requests

Whether chat model requests should be logged

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_LOG_REQUESTS

boolean

false

quarkus.langchain4j.ollama."model-name".chat-model.log-responses

Whether chat model responses should be logged

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_LOG_RESPONSES

boolean

false

quarkus.langchain4j.ollama."model-name".embedding-model.temperature

The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively.

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_TEMPERATURE

double

${quarkus.langchain4j.temperature:0.8}

quarkus.langchain4j.ollama."model-name".embedding-model.num-predict

Maximum number of tokens to predict when generating text

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_NUM_PREDICT

int

128

quarkus.langchain4j.ollama."model-name".embedding-model.stop

Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_STOP

list of string

quarkus.langchain4j.ollama."model-name".embedding-model.top-p

Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_TOP_P

double

0.9

quarkus.langchain4j.ollama."model-name".embedding-model.top-k

Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_TOP_K

int

40

quarkus.langchain4j.ollama."model-name".embedding-model.log-requests

Whether embedding model requests should be logged

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_LOG_REQUESTS

boolean

false

quarkus.langchain4j.ollama."model-name".embedding-model.log-responses

Whether embedding model responses should be logged

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_LOG_RESPONSES

boolean

false

About the Duration format

To write duration values, use the standard java.time.Duration format. See the Duration#parse() Java API documentation for more information.

You can also use a simplified format, starting with a number:

If the value is only a number, it represents time in seconds.
If the value is a number followed by ms, it represents time in milliseconds.

In other cases, the simplified format is translated to the java.time.Duration format for parsing:

If the value is a number followed by h, m, or s, it is prefixed with PT.
If the value is a number followed by d, it is prefixed with P.