Podman AI Lab Chat Models

Podman AI Lab is a local AI development environment provided as an extension to Podman Desktop. It simplifies experimentation with AI by providing a curated catalog of models and recipes, all runnable on your local machine via containerized infrastructure.

Podman AI Lab includes an inference server that is compatible with the OpenAI REST API, making it usable with the quarkus-langchain4j-openai extension.

Prerequisites

To use Podman AI Lab with Quarkus:

Install Podman Desktop.
From the Podman Desktop UI, install the Podman AI Lab extension from the Extensions catalog. See this guide for step-by-step instructions.
Launch Podman AI Lab and download a model (such as granite-7b) through the UI.
Start the inference server by selecting the model and clicking Run.

Dependency

Add the OpenAI extension to your project:

<dependency>
    <groupId>io.quarkiverse.langchain4j</groupId>
    <artifactId>quarkus-langchain4j-openai</artifactId>
    <version>1.4.0.CR1</version>
</dependency>

Even though this is not an actual OpenAI deployment, the API is compatible and works seamlessly with this extension.

Only the chat model API is supported at the moment. Other OpenAI APIs (like embeddings or moderation) are not available through Podman AI Lab.

Configuration

Once the inference server is running (e.g., on port 44079), configure your application like so:

quarkus.langchain4j.openai.base-url=http://localhost:44079/v1
# Responses may be slightly slower than cloud models
quarkus.langchain4j.openai.timeout=60s

You can verify the port from the Podman AI Lab UI under the Running Inference Server section.

Podman AI Lab currently supports only a single running model at a time. This means that any model-name configuration is ignored. The model to use must be selected manually through the Podman AI Lab UI.

Usage

Once configured, you can use the API as if communicating with an OpenAI model:

@Inject ChatModel chatModel;

Or register an AI service interface:

@RegisterAiService
public interface Assistant {
    String chat(String prompt);
}

Limitations

Only chat models are supported at the moment.
Embedding and moderation APIs are not yet available.
The inference server only runs one model at a time.
GPU support is not currently enabled; all inference runs on CPU.