Ollama
Prerequisites
To use Ollama, you need to have a running Ollama installed. Ollama is available for all major platforms and its installation is quite easy, simply visit Ollama download page and follow the instructions.
Once installed, check that Ollama is running using:
> ollama --version
Dev Service
Quarkus LangChain4j automatically handles the pulling of the models configured by the application, so there is no need for users to do so manually.
Models are huge. For example Llama3 is 4.7Gb, so make sure you have enough disk space. |
Due to model’s large size, pulling them can take time |
Using Ollama
To integrate with models running on Ollama, add the following dependency into your project:
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-ollama</artifactId>
<version>0.17.0.CR1</version>
</dependency>
If no other LLM extension is installed, AI Services will automatically utilize the configured Ollama model.
By default, the extension uses llama3.1
, the model we pulled in the previous section.
You can change it by setting the quarkus.langchain4j.ollama.chat-model.model-id
property in the application.properties
file:
quarkus.langchain4j.ollama.chat-model.model-id=mistral
Configuration
Several configuration properties are available:
Configuration property fixed at build time - All other configuration properties are overridable at runtime
Type |
Default |
|
---|---|---|
Whether the model should be enabled Environment variable: |
boolean |
|
Whether the model should be enabled Environment variable: |
boolean |
|
Model to use. According to Ollama docs, the default value is Environment variable: |
string |
|
Model to use. According to Ollama docs, the default value is Environment variable: |
string |
|
Base URL where the Ollama serving is running Environment variable: |
string |
|
Timeout for Ollama calls Environment variable: |
|
|
Whether the Ollama client should log requests Environment variable: |
boolean |
|
Whether the Ollama client should log responses Environment variable: |
boolean |
|
Whether to enable the integration. Defaults to Environment variable: |
boolean |
|
The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively. Environment variable: |
double |
|
Maximum number of tokens to predict when generating text Environment variable: |
int |
|
Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return Environment variable: |
list of string |
|
Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text Environment variable: |
double |
|
Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative Environment variable: |
int |
|
With a static number the result is always the same. With a random number the result varies Example:
Environment variable: |
int |
|
the format to return a response in. Currently, the only accepted value is Environment variable: |
string |
|
Whether chat model requests should be logged Environment variable: |
boolean |
|
Whether chat model responses should be logged Environment variable: |
boolean |
|
The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively. Environment variable: |
double |
|
Maximum number of tokens to predict when generating text Environment variable: |
int |
|
Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return Environment variable: |
list of string |
|
Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text Environment variable: |
double |
|
Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative Environment variable: |
int |
|
Whether embedding model requests should be logged Environment variable: |
boolean |
|
Whether embedding model responses should be logged Environment variable: |
boolean |
|
Type |
Default |
|
Model to use. According to Ollama docs, the default value is Environment variable: |
string |
|
Model to use. According to Ollama docs, the default value is Environment variable: |
string |
|
Base URL where the Ollama serving is running Environment variable: |
string |
|
Timeout for Ollama calls Environment variable: |
|
|
Whether the Ollama client should log requests Environment variable: |
boolean |
|
Whether the Ollama client should log responses Environment variable: |
boolean |
|
Whether to enable the integration. Defaults to Environment variable: |
boolean |
|
The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively. Environment variable: |
double |
|
Maximum number of tokens to predict when generating text Environment variable: |
int |
|
Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return Environment variable: |
list of string |
|
Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text Environment variable: |
double |
|
Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative Environment variable: |
int |
|
With a static number the result is always the same. With a random number the result varies Example:
Environment variable: |
int |
|
the format to return a response in. Currently, the only accepted value is Environment variable: |
string |
|
Whether chat model requests should be logged Environment variable: |
boolean |
|
Whether chat model responses should be logged Environment variable: |
boolean |
|
The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively. Environment variable: |
double |
|
Maximum number of tokens to predict when generating text Environment variable: |
int |
|
Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return Environment variable: |
list of string |
|
Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text Environment variable: |
double |
|
Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative Environment variable: |
int |
|
Whether embedding model requests should be logged Environment variable: |
boolean |
|
Whether embedding model responses should be logged Environment variable: |
boolean |
|
About the Duration format
To write duration values, use the standard You can also use a simplified format, starting with a number:
In other cases, the simplified format is translated to the
|
If the Ollama server is running behind an authentication proxy, then the authentication header can easily be added to each HTTP request by adding a ClientRequestFilter like so: |
@Provider
public class OllamaClientAuthHeaderFilter implements ClientRequestFilter {
@Override
public void filter(ClientRequestContext requestContext) {
requestContext.getHeaders().add("Authentication", "Bearer someToken");
}
}
This class is a CDI bean which gives the ability to utilize other CDI beans if necessary.
If you want to use it with native compilation you should add the @RegisterForReflection annotation. See Quarkus documentation for more information.
This should be not necessary with future releases of Quarkus.
|
Document Retriever and Embedding
Ollama also provides embedding models.
By default, it uses nomic-embed-text
.
You can change the default embedding model by setting the quarkus.langchain4j.ollama.embedding-model.model-id
property in the application.properties
file:
quarkus.langchain4j.log-requests=true
quarkus.langchain4j.log-responses=true
quarkus.langchain4j.ollama.chat-model.model-id=mistral
quarkus.langchain4j.ollama.embedding-model.model-id=mistral
If no other LLM extension is installed, retrieve the embedding model as follows:
@Inject EmbeddingModel model; // Injects the embedding model