Ollama

Ollama proposes a way to run large language models (LLMs) locally. You can run many models such as LLama2, Mistral, or CodeLlama on your local machine.

Prerequisites

To use Ollama, you need to have a running Ollama server. Go to the Ollama download page and download the server for your platform.

Once installed, check that Ollama is running using:

> ollama list

It may not display any model, which is fine, let’s pull the llama2 model:

> ollama pull llama2

Models are huge. For example Llama2 is 3.8Gb. Make sure you have enough disk space.

Let’s also pull the default embedding model:

> ollama pull nomic-embed-text

Dev Service

If you have Ollama running locally, you do not need a dev service. However, if you want to use the Ollama dev service, add the following dependency to your project:

<dependency>
    <groupId>io.quarkiverse.langchain4j</groupId>
    <artifactId>quarkus-langchain4j-ollama-devservices</artifactId>
    <version>0.13.1</version>
</dependency>

Then, in your application.properties file add:

quarkus.langchain4j.ollama.devservices.model=mistral # It uses orca-mini by default

The dev service will start an Ollama server for you, using a docker container. Note that the provisioning can take some time.

Using Ollama

To integrate with models running on Ollama, add the following dependency into your project:

<dependency>
    <groupId>io.quarkiverse.langchain4j</groupId>
    <artifactId>quarkus-langchain4j-ollama</artifactId>
    <version>0.13.1</version>
</dependency>

If no other LLM extension is installed, AI Services will automatically utilize the configured Ollama model.

By default, the extension uses llama2, the model we pulled in the previous section. You can change it by setting the quarkus.langchain4j.ollama.chat-model.model-id property in the application.properties file:

# Do not forget to pull the model before using it using `ollama pull <model-id>`
quarkus.langchain4j.ollama.chat-model.model-id=mistral

Configuration

Several configuration properties are available:

Configuration property fixed at build time - All other configuration properties are overridable at runtime

Configuration property	Type	Default
`quarkus.langchain4j.ollama.chat-model.enabled` Whether the model should be enabled Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_ENABLED`	boolean	`true`
`quarkus.langchain4j.ollama.embedding-model.enabled` Whether the model should be enabled Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_ENABLED`	boolean	`true`
`quarkus.langchain4j.ollama.base-url` Base URL where the Ollama serving is running Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_BASE_URL`	string	`http://localhost:11434`
`quarkus.langchain4j.ollama.timeout` Timeout for Ollama calls Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_TIMEOUT`	Duration	`10S`
`quarkus.langchain4j.ollama.log-requests` Whether the Ollama client should log requests Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.ollama.log-responses` Whether the Ollama client should log responses Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.ollama.enable-integration` Whether to enable the integration. Defaults to `true`, which means requests are made to the OpenAI provider. Set to `false` to disable all requests. Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_ENABLE_INTEGRATION`	boolean	`true`
`quarkus.langchain4j.ollama.chat-model.model-id` Model to use. According to Ollama docs, the default value is `llama3` Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_MODEL_ID`	string	`llama3`
`quarkus.langchain4j.ollama.chat-model.temperature` The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively. Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_TEMPERATURE`	double	`0.8`
`quarkus.langchain4j.ollama.chat-model.num-predict` Maximum number of tokens to predict when generating text Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_NUM_PREDICT`	int	`128`
`quarkus.langchain4j.ollama.chat-model.stop` Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_STOP`	list of string
`quarkus.langchain4j.ollama.chat-model.top-p` Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_TOP_P`	double	`0.9`
`quarkus.langchain4j.ollama.chat-model.top-k` Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_TOP_K`	int	`40`
`quarkus.langchain4j.ollama.chat-model.seed` With a static number the result is always the same. With a random number the result varies Example: `Random random = new Random(); int x = random.nextInt(Integer.MAX_VALUE);` Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_SEED`	int
`quarkus.langchain4j.ollama.chat-model.format` the format to return a response in. Currently, the only accepted value is `json` Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_FORMAT`	string
`quarkus.langchain4j.ollama.chat-model.log-requests` Whether chat model requests should be logged Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.ollama.chat-model.log-responses` Whether chat model responses should be logged Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.ollama.embedding-model.model-id` Model to use. According to Ollama docs, the default value is `nomic-embed-text` Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_MODEL_ID`	string	`nomic-embed-text`
`quarkus.langchain4j.ollama.embedding-model.temperature` The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively. Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_TEMPERATURE`	double	`0.8`
`quarkus.langchain4j.ollama.embedding-model.num-predict` Maximum number of tokens to predict when generating text Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_NUM_PREDICT`	int	`128`
`quarkus.langchain4j.ollama.embedding-model.stop` Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_STOP`	list of string
`quarkus.langchain4j.ollama.embedding-model.top-p` Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_TOP_P`	double	`0.9`
`quarkus.langchain4j.ollama.embedding-model.top-k` Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_TOP_K`	int	`40`
`quarkus.langchain4j.ollama.embedding-model.log-requests` Whether embedding model requests should be logged Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.ollama.embedding-model.log-responses` Whether embedding model responses should be logged Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_LOG_RESPONSES`	boolean	`false`
Named model config	Type	Default
`quarkus.langchain4j.ollama."model-name".base-url` Base URL where the Ollama serving is running Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__BASE_URL`	string	`http://localhost:11434`
`quarkus.langchain4j.ollama."model-name".timeout` Timeout for Ollama calls Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__TIMEOUT`	Duration	`10S`
`quarkus.langchain4j.ollama."model-name".log-requests` Whether the Ollama client should log requests Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.ollama."model-name".log-responses` Whether the Ollama client should log responses Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.ollama."model-name".enable-integration` Whether to enable the integration. Defaults to `true`, which means requests are made to the OpenAI provider. Set to `false` to disable all requests. Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__ENABLE_INTEGRATION`	boolean	`true`
`quarkus.langchain4j.ollama."model-name".chat-model.model-id` Model to use. According to Ollama docs, the default value is `llama3` Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_MODEL_ID`	string	`llama3`
`quarkus.langchain4j.ollama."model-name".chat-model.temperature` The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively. Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_TEMPERATURE`	double	`0.8`
`quarkus.langchain4j.ollama."model-name".chat-model.num-predict` Maximum number of tokens to predict when generating text Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_NUM_PREDICT`	int	`128`
`quarkus.langchain4j.ollama."model-name".chat-model.stop` Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_STOP`	list of string
`quarkus.langchain4j.ollama."model-name".chat-model.top-p` Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_TOP_P`	double	`0.9`
`quarkus.langchain4j.ollama."model-name".chat-model.top-k` Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_TOP_K`	int	`40`
`quarkus.langchain4j.ollama."model-name".chat-model.seed` With a static number the result is always the same. With a random number the result varies Example: `Random random = new Random(); int x = random.nextInt(Integer.MAX_VALUE);` Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_SEED`	int
`quarkus.langchain4j.ollama."model-name".chat-model.format` the format to return a response in. Currently, the only accepted value is `json` Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_FORMAT`	string
`quarkus.langchain4j.ollama."model-name".chat-model.log-requests` Whether chat model requests should be logged Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.ollama."model-name".chat-model.log-responses` Whether chat model responses should be logged Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.ollama."model-name".embedding-model.model-id` Model to use. According to Ollama docs, the default value is `nomic-embed-text` Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_MODEL_ID`	string	`nomic-embed-text`
`quarkus.langchain4j.ollama."model-name".embedding-model.temperature` The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively. Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_TEMPERATURE`	double	`0.8`
`quarkus.langchain4j.ollama."model-name".embedding-model.num-predict` Maximum number of tokens to predict when generating text Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_NUM_PREDICT`	int	`128`
`quarkus.langchain4j.ollama."model-name".embedding-model.stop` Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_STOP`	list of string
`quarkus.langchain4j.ollama."model-name".embedding-model.top-p` Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_TOP_P`	double	`0.9`
`quarkus.langchain4j.ollama."model-name".embedding-model.top-k` Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_TOP_K`	int	`40`
`quarkus.langchain4j.ollama."model-name".embedding-model.log-requests` Whether embedding model requests should be logged Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.ollama."model-name".embedding-model.log-responses` Whether embedding model responses should be logged Environment variable: `QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_LOG_RESPONSES`	boolean	`false`

Configuration property

Type

Default

quarkus.langchain4j.ollama.chat-model.enabled

Whether the model should be enabled

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_ENABLED

boolean

true

quarkus.langchain4j.ollama.embedding-model.enabled

Whether the model should be enabled

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_ENABLED

boolean

true

quarkus.langchain4j.ollama.base-url

Base URL where the Ollama serving is running

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_BASE_URL

string

http://localhost:11434

quarkus.langchain4j.ollama.timeout

Timeout for Ollama calls

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_TIMEOUT

Duration

10S

quarkus.langchain4j.ollama.log-requests

Whether the Ollama client should log requests

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_LOG_REQUESTS

boolean

false

quarkus.langchain4j.ollama.log-responses

Whether the Ollama client should log responses

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_LOG_RESPONSES

boolean

false

quarkus.langchain4j.ollama.enable-integration

Whether to enable the integration. Defaults to true, which means requests are made to the OpenAI provider. Set to false to disable all requests.

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_ENABLE_INTEGRATION

boolean

true

quarkus.langchain4j.ollama.chat-model.model-id

Model to use. According to Ollama docs, the default value is llama3

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_MODEL_ID

string

llama3

quarkus.langchain4j.ollama.chat-model.temperature

The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively.

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_TEMPERATURE

double

0.8

quarkus.langchain4j.ollama.chat-model.num-predict

Maximum number of tokens to predict when generating text

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_NUM_PREDICT

int

128

quarkus.langchain4j.ollama.chat-model.stop

Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_STOP

list of string

quarkus.langchain4j.ollama.chat-model.top-p

Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_TOP_P

double

0.9

quarkus.langchain4j.ollama.chat-model.top-k

Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_TOP_K

int

40

quarkus.langchain4j.ollama.chat-model.seed

With a static number the result is always the same. With a random number the result varies Example:

Random random = new Random(); int x = random.nextInt(Integer.MAX_VALUE);

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_SEED

int

quarkus.langchain4j.ollama.chat-model.format

the format to return a response in. Currently, the only accepted value is json

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_FORMAT

string

quarkus.langchain4j.ollama.chat-model.log-requests

Whether chat model requests should be logged

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_LOG_REQUESTS

boolean

false

quarkus.langchain4j.ollama.chat-model.log-responses

Whether chat model responses should be logged

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_LOG_RESPONSES

boolean

false

quarkus.langchain4j.ollama.embedding-model.model-id

Model to use. According to Ollama docs, the default value is nomic-embed-text

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_MODEL_ID

string

nomic-embed-text

quarkus.langchain4j.ollama.embedding-model.temperature

The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively.

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_TEMPERATURE

double

0.8

quarkus.langchain4j.ollama.embedding-model.num-predict

Maximum number of tokens to predict when generating text

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_NUM_PREDICT

int

128

quarkus.langchain4j.ollama.embedding-model.stop

Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_STOP

list of string

quarkus.langchain4j.ollama.embedding-model.top-p

Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_TOP_P

double

0.9

quarkus.langchain4j.ollama.embedding-model.top-k

Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_TOP_K

int

40

quarkus.langchain4j.ollama.embedding-model.log-requests

Whether embedding model requests should be logged

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_LOG_REQUESTS

boolean

false

quarkus.langchain4j.ollama.embedding-model.log-responses

Whether embedding model responses should be logged

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA_EMBEDDING_MODEL_LOG_RESPONSES

boolean

false

Named model config

Type

Default

quarkus.langchain4j.ollama."model-name".base-url

Base URL where the Ollama serving is running

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__BASE_URL

string

http://localhost:11434

quarkus.langchain4j.ollama."model-name".timeout

Timeout for Ollama calls

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__TIMEOUT

Duration

10S

quarkus.langchain4j.ollama."model-name".log-requests

Whether the Ollama client should log requests

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__LOG_REQUESTS

boolean

false

quarkus.langchain4j.ollama."model-name".log-responses

Whether the Ollama client should log responses

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__LOG_RESPONSES

boolean

false

quarkus.langchain4j.ollama."model-name".enable-integration

Whether to enable the integration. Defaults to true, which means requests are made to the OpenAI provider. Set to false to disable all requests.

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__ENABLE_INTEGRATION

boolean

true

quarkus.langchain4j.ollama."model-name".chat-model.model-id

Model to use. According to Ollama docs, the default value is llama3

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_MODEL_ID

string

llama3

quarkus.langchain4j.ollama."model-name".chat-model.temperature

The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively.

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_TEMPERATURE

double

0.8

quarkus.langchain4j.ollama."model-name".chat-model.num-predict

Maximum number of tokens to predict when generating text

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_NUM_PREDICT

int

128

quarkus.langchain4j.ollama."model-name".chat-model.stop

Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_STOP

list of string

quarkus.langchain4j.ollama."model-name".chat-model.top-p

Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_TOP_P

double

0.9

quarkus.langchain4j.ollama."model-name".chat-model.top-k

Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_TOP_K

int

40

quarkus.langchain4j.ollama."model-name".chat-model.seed

With a static number the result is always the same. With a random number the result varies Example:

Random random = new Random(); int x = random.nextInt(Integer.MAX_VALUE);

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_SEED

int

quarkus.langchain4j.ollama."model-name".chat-model.format

the format to return a response in. Currently, the only accepted value is json

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_FORMAT

string

quarkus.langchain4j.ollama."model-name".chat-model.log-requests

Whether chat model requests should be logged

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_LOG_REQUESTS

boolean

false

quarkus.langchain4j.ollama."model-name".chat-model.log-responses

Whether chat model responses should be logged

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__CHAT_MODEL_LOG_RESPONSES

boolean

false

quarkus.langchain4j.ollama."model-name".embedding-model.model-id

Model to use. According to Ollama docs, the default value is nomic-embed-text

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_MODEL_ID

string

nomic-embed-text

quarkus.langchain4j.ollama."model-name".embedding-model.temperature

The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively.

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_TEMPERATURE

double

0.8

quarkus.langchain4j.ollama."model-name".embedding-model.num-predict

Maximum number of tokens to predict when generating text

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_NUM_PREDICT

int

128

quarkus.langchain4j.ollama."model-name".embedding-model.stop

Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_STOP

list of string

quarkus.langchain4j.ollama."model-name".embedding-model.top-p

Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_TOP_P

double

0.9

quarkus.langchain4j.ollama."model-name".embedding-model.top-k

Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_TOP_K

int

40

quarkus.langchain4j.ollama."model-name".embedding-model.log-requests

Whether embedding model requests should be logged

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_LOG_REQUESTS

boolean

false

quarkus.langchain4j.ollama."model-name".embedding-model.log-responses

Whether embedding model responses should be logged

Environment variable: QUARKUS_LANGCHAIN4J_OLLAMA__MODEL_NAME__EMBEDDING_MODEL_LOG_RESPONSES

boolean

false

About the Duration format

To write duration values, use the standard java.time.Duration format. See the Duration#parse() Java API documentation for more information.

You can also use a simplified format, starting with a number:

If the value is only a number, it represents time in seconds.
If the value is a number followed by ms, it represents time in milliseconds.

In other cases, the simplified format is translated to the java.time.Duration format for parsing:

If the value is a number followed by h, m, or s, it is prefixed with PT.
If the value is a number followed by d, it is prefixed with P.

Document Retriever and Embedding

Ollama also provides embedding models. By default, it uses nomic-embed-text (make sure you pulled that model as indicated in the prerequisites section).

You can change the default embedding model by setting the quarkus.langchain4j.ollama.embedding-model.model-id property in the application.properties file:

quarkus.langchain4j.ollama.log-requests=true
quarkus.langchain4j.ollama.log-responses=true

quarkus.langchain4j.ollama.chat-model.model-id=mistral
quarkus.langchain4j.ollama.embedding-model.model-id=mistral

If no other LLM extension is installed, retrieve the embedding model as follows:

@Inject EmbeddingModel model; // Injects the embedding model

However, in general, we recommend using local embedding models, as Ollama embeddings are rather slow.