Using Ollama with Quarkus LangChain4j
This guide shows how to use local Ollama models with the Quarkus LangChain4j extension. You’ll learn how to:
-
Set up the environment and dependencies
-
Use an Ollama-powered chat model
-
Use function calling (tool execution)
-
Use an Ollama embedding model
1. Setup
Install Ollama
First, install Ollama from https://ollama.com. It lets you run LLMs locally with minimal setup.
To verify installation:
ollama run llama3
You can pull other models using:
ollama pull qwen3:1.7b
ollama pull snowflake-arctic-embed:latest
Some models may require more RAM or GPU acceleration. Check the Ollama model card for details. |
In dev mode, Quarkus will automatically starts the Ollama server if it is not already running. This allows you to test your application without needing to manually start the Ollama server. It will also automatically pull the models you use in your application if they are not already available locally. This can take some time, so pre-pulling is recommended for faster startup. |
Add Maven Dependencies
Add the following dependencies to your pom.xml
:
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-ollama</artifactId>
<version>1.1.0</version>
</dependency>
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-rest-jackson</artifactId>
</dependency>
The quarkus-langchain4j-ollama
extension provides the necessary integration with Ollama models.
The quarkus-rest-jackson
dependency is needed for REST endpoints (for demo purpose).
Configure the Application
In your application.properties
, configure the chat and embedding models:
# Chat model
quarkus.langchain4j.ollama.chat-model.model-name=qwen3:1.7b (1)
quarkus.langchain4j.ollama.chat-model.temperature=0 (2)
quarkus.langchain4j.timeout=60s (3)
# Embedding model
quarkus.langchain4j.ollama.embedding-model.model-name=snowflake-arctic-embed:latest (4)
-
Specify the Ollama chat model to use (e.g.,
qwen3:1.7b
). -
Set the temperature to 0 for deterministic outputs (especially useful for function calling).
-
Local inference can take time, so set a reasonable timeout (e.g., 60 seconds).
-
Specify the Ollama embedding model (e.g.,
snowflake-arctic-embed:latest
).
2. Using the Ollama Chat Model
To interact with an Ollama chat model, define an AI service interface:
@RegisterAiService
public interface Assistant {
@UserMessage("Say 'hello world' using a 4 line poem.")
String greeting();
}
Quarkus will automatically generate the implementation using the configured Ollama model.
You can expose this through a simple REST endpoint:
@Path("/hello")
public class GreetingResource {
@Inject
Assistant assistant;
@GET
public String hello() {
return assistant.greeting();
}
}
Visit http://localhost:8080/hello to see the model generate a 4-line “hello world” poem:
In the quiet dawn, a whisper breaks the silence,
Hello, world, where dreams take flight and light.
The sun ascends, a golden, warm embrace,
A greeting to the earth, a heart's soft grace.
3. Using Function Calling
Ollama also provides reasoning model (like qwen3:1.7b
) that supports function calling, allowing the model to invoke external tools or business logic.
Here, we declare a tool method that logs a message:
@ApplicationScoped
public class SenderService {
@Tool
public void sendMessage(String message) {
Log.infof("Sending message: %s", message);
}
}
Then we declare an AI service that uses this tool:
@RegisterAiService
public interface Assistant {
@UserMessage("Say 'hello world' using a 4 line poem and send it using the SenderService.")
@ToolBox(SenderService.class)
String greetingAndSend();
}
The assistant will:
-
Generate a poem
-
Call the
sendMessage(…)
tool with the poem
You can test this via:
@GET
@Path("/function-calling")
public String helloWithFunctionCalling() {
return assistant.greetingAndSend();
}
Visit http://localhost:8080/hello/function-calling to trigger the tool. If you check the logs, you should see:
.... INFO [org.acm.SenderService] (executor-thread-1) Sending message: Hello, world!
A simple message.
In this, we go.
Peace and joy.
Lowering the temperature helps ensure the model uses the tool consistently. |
4. Using the Ollama Embedding Model
You can also use Ollama to generate text embeddings for vector-based tasks. This is useful for Retrieval-Augmented Generation (RAG) or semantic search.
Inject the EmbeddingModel
:
@Inject
EmbeddingModel embeddingModel;
Then use it like this:
@POST
@Path("/embed")
public List<Float> embed(String text) {
return embeddingModel.embed(text).content().vectorAsList();
}
Send a POST request with plain text to /hello/embed
, and you’ll get a float vector representing the input:
curl -X POST http://localhost:8080/hello/embed \
-H "Content-Type: text/plain" \
--data-binary @- <<EOF
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
EOF
You will receive a list of floats representing the embedding.
5. Conclusion
Ollama enables local inference with a wide variety of LLMs, and Quarkus LangChain4j makes it easy to integrate them into Java applications.
Next steps:
-
Try other Ollama models (e.g.
llama3
,mistral
) -
Switch the RAG quickstart to use Ollama-served models (both chat and embedding)
-
Implement more complex RAG workflows using Ollama models