IBM watsonx.ai

You can develop generative AI solutions with foundation models in IBM watsonx.ai. You can use prompts to generate, classify, summarize, or extract content from your input text. Choose from IBM models or open source models from Hugging Face. You can tune foundation models to customize your prompt output or optimize inferencing performance.

Supported only for IBM watsonx as a service on IBM Cloud.

Using watsonx.ai

To employ watsonx.ai LLMs, integrate the following dependency into your project:

<dependency>
    <groupId>io.quarkiverse.langchain4j</groupId>
    <artifactId>quarkus-langchain4j-watsonx</artifactId>
    <version>1.0.1</version>
</dependency>

If no other extension is installed, AI Services will automatically utilize the configured watsonx dependency.

Configuration

To use the watsonx.ai dependency, you must configure some required values in the application.properties file.

Base URL

The base-url property depends on the region of the provided service instance, use one of the following values:

Dallas: https://us-south.ml.cloud.ibm.com
Frankfurt: https://eu-de.ml.cloud.ibm.com
London: https://eu-gb.ml.cloud.ibm.com
Tokyo: https://jp-tok.ml.cloud.ibm.com
Sydney: https://au-syd.ml.cloud.ibm.com
Toronto: https://ca-tor.ml.cloud.ibm.com

quarkus.langchain4j.watsonx.base-url=https://us-south.ml.cloud.ibm.com

Project ID

To prompt foundation models in watsonx.ai, you need to pass the identifier of a project.

To get the ID of a project, complete the following steps:

Go to https://dataplatform.cloud.ibm.com/projects/?context=wx.
Open the project, and then click the Manage tab.
Copy the project ID from the Details section of the General page.

quarkus.langchain4j.watsonx.project-id=23d...

If you like, you can use the Space ID property with quarkus.langchain4j.watsonx.space-id

API Key

To use foundation models in IBM Watsonx.ai, you need an IBM Cloud API key.

Follow these steps to generate one:

Go to https://cloud.ibm.com/iam/apikeys.
Click Create + button.
Provide a name and description for your new API key, then click Create.
Copy or securely save the API key.

quarkus.langchain4j.watsonx.api-key=hG-...

Modes of Interaction

The watsonx.ai module provides two modes for interacting with large language models: chat and generation. These modes allow you to tailor interactions based on the complexity of your use case and the level of control you require over the prompt structure.

You can select the interaction mode using the property quarkus.langchain4j.watsonx.mode.

chat: This mode abstracts the complexity of tagging by automatically formatting prompts (default value).
generation: In this mode, you must explicitly structure the prompts using the required model-specific tags. This provides full control over the format of the prompt but requires in-depth knowledge of the model being used. For best results, always refer to the documentation provided for each model to maximize the effectiveness of your prompts.

Each mode uses its own property namespace for customization:

For chat mode the properties are under quarkus.langchain4j.watsonx.chat-model.
For generation mode properties are under quarkus.langchain4j.watsonx.generation-model.

Chat Mode Example

The following example demonstrates how to configure and use chat mode:

quarkus.langchain4j.watsonx.base-url=${BASE_URL}
quarkus.langchain4j.watsonx.api-key=${API_KEY}
quarkus.langchain4j.watsonx.project-id=${PROJECT_ID}
quarkus.langchain4j.watsonx.chat-model.model-id=mistralai/mistral-large
quarkus.langchain4j.watsonx.mode=chat // You can omit this property, as 'chat' is the default mode.

@RegisterAiService
public interface AiService {
    @SystemMessage("You are a helpful assistant")
    public String chat(@MemoryId String id, @UserMessage message);
}

Generation Mode Example

The following example demonstrates how to configure and use generation mode:

quarkus.langchain4j.watsonx.base-url=${BASE_URL}
quarkus.langchain4j.watsonx.api-key=${API_KEY}
quarkus.langchain4j.watsonx.project-id=${PROJECT_ID}
quarkus.langchain4j.watsonx.generation-model.model-id=mistralai/mistral-large
quarkus.langchain4j.watsonx.mode=generation

@RegisterAiService(chatMemoryProviderSupplier = RegisterAiService.NoChatMemoryProviderSupplier.class)
public interface AiService {
    @SystemMessage("<s>[INST] You are a helpful assistant [/INST]</s>")
    @UserMessage("[INST] What is the capital of {capital}? [/INST]")
    public String askCapital(String capital);
}

The @SystemMessage and @UserMessage annotations are joined by default with no separator (an empty string ""). If you want to change this behavior, use the property quarkus.langchain4j.watsonx.generation-model.prompt-joiner=<value>.

Text Extraction

The TextExtraction feature enables developers to extract text from high-value business documents stored in IBM Cloud Object Storage. Extracted text can be used for AI processing, key information identification, or further document analysis.

The API supports text extraction from the following file types:

PDF
GIF
JPG
PNG
TIFF
BMP
DOC
DOCX
HTML
JFIF
PPT
PPTX

The extracted text can be output in the following formats:

JSON
MARKDOWN
HTML
PLAIN_TEXT
PAGE_IMAGES

Configuration

To enable TextExtraction in your application, configure the following properties:

quarkus.langchain4j.watsonx.text-extraction.base-url=<base-url>
quarkus.langchain4j.watsonx.text-extraction.document-reference.connection=<connection-id>
quarkus.langchain4j.watsonx.text-extraction.document-reference.bucket-name=<bucket-name>
quarkus.langchain4j.watsonx.text-extraction.results-reference.connection=<connection-id>
quarkus.langchain4j.watsonx.text-extraction.results-reference.bucket-name=<bucket-name>

base-url: The endpoint where the IBM Cloud Object Storage instance is deployed. To find the appropriate value, refer to the IBM Cloud Object Storage endpoint table.
document-reference.connection: The connection asset ID containing credentials to access the source storage.
document-reference.bucket-name: The bucket where documents to be processed will be uploaded.
results-reference.connection: The connection asset ID containing credentials to access the output storage.
results-reference.bucket-name: The bucket where extracted text documents will be saved as new files.

The document reference properties define the source storage for input and uloaded files, while the results reference properties specify where the extracted content is stored. Both can refer to the same bucket or different ones.

For more information on how to get the connection parameter for the document-reference and results-reference you can refer to the documentation at this link.

Using Text Extraction

The TextExtraction class provides multiple methods for extracting text from documents. You can either extract text from an existing file in IBM Cloud Object Storage or upload a file and extract its content. To use TextExtraction, you need to inject an instance into your application. If multiple configurations are defined, you can specify the appropriate one using the @ModelName qualifier.

@Inject
TextExtraction textExtraction;

@Inject
@ModelName("custom")
TextExtraction customTextExtraction;

You can start the extraction process in two ways.

First, if the document is already stored in IBM Cloud Object Storage, you can initiate the extraction by using the following method:

String extractionId = textExtraction.startExtraction("path/to/document.pdf");

Alternatively, if you’re working with a local file, you can upload it and start the extraction process with:

File file = new File("path/to/document.pdf");
String extractionId = textExtraction.uploadAndStartExtraction(file);

After starting the extraction, you can check its status by calling:

TextExtractionResponse response = textExtraction.checkExtractionStatus(extractionId);

If you need to extract and retrieve the text immediately, you have two options.

You can either extract text from an existing file directly:

String extractedText = textExtraction.extractAndFetch("path/to/document.pdf");

Or upload the file and retrieve the extracted text immediately:

File file = new File("path/to/document.pdf");
String extractedText = textExtraction.uploadExtractAndFetch(file);

All extraction methods can accept a Parameters object to customize the behavior of the text extraction request.

The Parameters object allows fine-grained control over the extraction process, including:

Whether the uploaded file should be deleted after processing (removeUploadedFile).
Whether the output file should be deleted from the results location after retrieval (removeOutputFile).
Customization of the output formats (e.g., plain text, HTML, markdown).
Selection of processing mode (standard or high_quality).
OCR behavior (disabled, enabled, or forced).
Whether to apply automatic rotation correction on images.
Generation of embedded images in markdown or JSON output.
Output resolution for generated images (DPI).
Whether to include token-level data and bounding boxes in the output.
Overriding default values from the configured data connection or system settings.

var parameters = Parameters.builder()
        .removeOutputFile(true)
        .removeUploadedFile(true)
        .types(HTML)
        .mode(Mode.STANDARD)
        .ocr(OCR.DISABLED)
        .autoRotationCorrection(false)
        .embeddedImages(EmbeddedImages.DISABLED)
        .dpi(16)
        .outputTokensAndBbox(false)
        .build()

File file = new File("path/to/document.pdf");
String extractedText = textExtraction.uploadExtractAndFetch(file, parameters));

To perform Retrieval-Augmented Generation (RAG) on the extracted files in IBM Cloud Object Storage, you can use the following dependency to interact with the storage as it uses the S3 protocol:

<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-document-loader-amazon-s3</artifactId>
    <version>...</version>
</dependency>

All configuration properties

Configuration property fixed at build time - All other configuration properties are overridable at runtime

Configuration property	Type	Default
`quarkus.langchain4j.watsonx.chat-model.enabled` Whether the model should be enabled. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_ENABLED`	boolean	`true`
`quarkus.langchain4j.watsonx.embedding-model.enabled` Whether the embedding model should be enabled. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_EMBEDDING_MODEL_ENABLED`	boolean	`true`
`quarkus.langchain4j.watsonx.scoring-model.enabled` Whether the scoring model should be enabled. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_SCORING_MODEL_ENABLED`	boolean	`true`
`quarkus.langchain4j.watsonx.mode` Specifies the mode of interaction with the LLM model. This property allows you to choose between two modes of operation: chat: prompts are automatically enriched with the specific tags defined by the model generation: prompts require manual specification of tags Allowable values: `[chat, generation]` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_MODE`	string	`chat`
`quarkus.langchain4j.watsonx.base-url` Specifies the base URL of the watsonx.ai API. A list of all available URLs is provided in the IBM Watsonx.ai documentation at the this link. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_BASE_URL`	string
`quarkus.langchain4j.watsonx.api-key` IBM Cloud API key. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_API_KEY`	string
`quarkus.langchain4j.watsonx.timeout` Timeout for watsonx.ai calls. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_TIMEOUT`	Duration	`10s`
`quarkus.langchain4j.watsonx.version` The version date for the API of the form YYYY-MM-DD. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_VERSION`	string	`2025-04-23`
`quarkus.langchain4j.watsonx.space-id` The space that contains the resource. Either `space_id` or `project_id` has to be given. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_SPACE_ID`	string
`quarkus.langchain4j.watsonx.project-id` The project that contains the resource. Either `space_id` or `project_id` has to be given. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_PROJECT_ID`	string
`quarkus.langchain4j.watsonx.log-requests` Whether the watsonx.ai client should log requests. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.watsonx.log-responses` Whether the watsonx.ai client should log responses. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.watsonx.enable-integration` Whether to enable the integration. Defaults to `true`, which means requests are made to the watsonx.ai provider. Set to `false` to disable all requests. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_ENABLE_INTEGRATION`	boolean	`true`
`quarkus.langchain4j.watsonx.iam.base-url` Base URL of the IAM Authentication API. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_IAM_BASE_URL`	URL	`https://iam.cloud.ibm.com`
`quarkus.langchain4j.watsonx.iam.timeout` Timeout for IAM authentication calls. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_IAM_TIMEOUT`	Duration	`10s`
`quarkus.langchain4j.watsonx.iam.grant-type` Grant type for the IAM Authentication API. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_IAM_GRANT_TYPE`	string	`urn:ibm:params:oauth:grant-type:apikey`
`quarkus.langchain4j.watsonx.text-extraction.base-url` Base URL of the Cloud Object Storage API. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_TEXT_EXTRACTION_BASE_URL`	string	required
`quarkus.langchain4j.watsonx.text-extraction.document-reference.connection` The ID of the connection asset that contains the credentials required to access the data. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_TEXT_EXTRACTION_DOCUMENT_REFERENCE_CONNECTION`	string	required
`quarkus.langchain4j.watsonx.text-extraction.document-reference.bucket-name` The name of the bucket containing the input document. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_TEXT_EXTRACTION_DOCUMENT_REFERENCE_BUCKET_NAME`	string	required
`quarkus.langchain4j.watsonx.text-extraction.results-reference.connection` The ID of the connection asset used to store the extracted results. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_TEXT_EXTRACTION_RESULTS_REFERENCE_CONNECTION`	string	required
`quarkus.langchain4j.watsonx.text-extraction.results-reference.bucket-name` The name of the bucket where the output files will be written. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_TEXT_EXTRACTION_RESULTS_REFERENCE_BUCKET_NAME`	string	required
`quarkus.langchain4j.watsonx.text-extraction.log-requests` Whether the Cloud Object Storage client should log requests. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_TEXT_EXTRACTION_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.watsonx.text-extraction.log-responses` Whether the Cloud Object Storage client should log responses. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_TEXT_EXTRACTION_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.watsonx.chat-model.model-id` Specifies the model to use for the chat completion. A list of all available models is provided in the IBM watsonx.ai documentation at the this link. To use a model, locate the `API model ID` column in the table and copy the corresponding model ID. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_MODEL_ID`	string	`meta-llama/llama-4-maverick-17b-128e-instruct-fp8`
`quarkus.langchain4j.watsonx.chat-model.tool-choice` Specifies how the model should choose which tool to call during a request. This value can be: auto: The model decides whether and which tool to call automatically. required: The model must call one of the available tools. If `toolChoiceName` is set, this value is ignored. Setting this value influences the tool-calling behavior of the model when no specific tool is required. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_TOOL_CHOICE`	`auto`, `required`
`quarkus.langchain4j.watsonx.chat-model.tool-choice-name` Specifies the name of a specific tool that the model must call. When set, the model will be forced to call the specified tool. The name must exactly match one of the available tools defined for the service. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_TOOL_CHOICE_NAME`	string
`quarkus.langchain4j.watsonx.chat-model.frequency-penalty` Positive values penalize new tokens based on their existing frequency in the generated text, reducing the likelihood of the model repeating the same lines verbatim. Possible values: `-2 < value < 2` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_FREQUENCY_PENALTY`	double	`0`
`quarkus.langchain4j.watsonx.chat-model.logprobs` Specifies whether to return the log probabilities of the output tokens. If set to `true`, the response will include the log probability of each output token in the content of the message. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_LOGPROBS`	boolean	`false`
`quarkus.langchain4j.watsonx.chat-model.top-logprobs` An integer specifying the number of most likely tokens to return at each token position, each with an associated log probability. The option `logprobs` must be set to `true` if this parameter is used. Possible values: `0 ≤ value ≤ 20` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_TOP_LOGPROBS`	int
`quarkus.langchain4j.watsonx.chat-model.max-tokens` The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model’s context length. Set to 0 for the model’s configured max generated tokens. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_MAX_TOKENS`	int	`1024`
`quarkus.langchain4j.watsonx.chat-model.n` Specifies how many chat completion choices to generate for each input message. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_N`	int	`1`
`quarkus.langchain4j.watsonx.chat-model.presence-penalty` Applies a penalty to new tokens based on whether they already appear in the generated text so far, encouraging the model to introduce new topics rather than repeat itself. Possible values: `-2 < value < 2` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_PRESENCE_PENALTY`	double	`0`
`quarkus.langchain4j.watsonx.chat-model.seed` Random number generator seed to use in sampling mode for experimental repeatability. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_SEED`	int
`quarkus.langchain4j.watsonx.chat-model.stop` Defines one or more stop sequences that will cause the model to stop generating further tokens if any of them are encountered in the output. This allows control over where the model should end its response. If a stop sequence is encountered before the minimum number of tokens has been generated, it will be ignored. Possible values: `0 ≤ number of items ≤ 4` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_STOP`	list of string
`quarkus.langchain4j.watsonx.chat-model.temperature` Specifies the sampling temperature to use in the generation process. Higher values (e.g. `0.8`) make the output more random and diverse, while lower values (e.g. `0.2`) make the output more focused and deterministic. Possible values: `0 < value < 2` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_TEMPERATURE`	double	`${quarkus.langchain4j.temperature:1.0}`
`quarkus.langchain4j.watsonx.chat-model.top-p` An alternative to sampling with `temperature`, called nucleus sampling, where the model considers the results of the tokens with `top_p` probability mass. So `0.1` means only the tokens comprising the top 10% probability mass are considered. Possible values: `0 < value < 1` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_TOP_P`	double	`1`
`quarkus.langchain4j.watsonx.chat-model.response-format` Specifies the desired format for the model’s output. Allowable values: `[json_object]` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_RESPONSE_FORMAT`	string
`quarkus.langchain4j.watsonx.chat-model.log-requests` Whether chat model requests should be logged. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.watsonx.chat-model.log-responses` Whether chat model responses should be logged. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.watsonx.generation-model.model-id` The id of the model to be used. All available models are listed in the IBM Watsonx.ai documentation at the link: following link. To use a model, locate the `API model_id` column in the table and copy the corresponding model ID. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_MODEL_ID`	string	`meta-llama/llama-4-maverick-17b-128e-instruct-fp8`
`quarkus.langchain4j.watsonx.generation-model.decoding-method` Represents the strategy used for picking the tokens during generation of the output text. During text generation when parameter value is set to `greedy`, each successive token corresponds to the highest probability token given the text that has already been generated. This strategy can lead to repetitive results especially for longer output sequences. The alternative `sample` strategy generates text by picking subsequent tokens based on the probability distribution of possible next tokens defined by (i.e., conditioned on) the already-generated text and the `top_k` and `top_p` parameters. Allowable values: `[sample,greedy]` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_DECODING_METHOD`	string	`greedy`
`quarkus.langchain4j.watsonx.generation-model.length-penalty.decay-factor` Represents the factor of exponential decay. Larger values correspond to more aggressive decay. Possible values: `> 1` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_LENGTH_PENALTY_DECAY_FACTOR`	double
`quarkus.langchain4j.watsonx.generation-model.length-penalty.start-index` A number of generated tokens after which this should take effect. Possible values: `≥ 0` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_LENGTH_PENALTY_START_INDEX`	int
`quarkus.langchain4j.watsonx.generation-model.max-new-tokens` The maximum number of new tokens to be generated. The maximum supported value for this field depends on the model being used. How the "token" is defined depends on the tokenizer and vocabulary size, which in turn depends on the model. Often the tokens are a mix of full words and sub-words. Depending on the users plan, and on the model being used, there may be an enforced maximum number of new tokens. Possible values: `≥ 0` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_MAX_NEW_TOKENS`	int	`200`
`quarkus.langchain4j.watsonx.generation-model.min-new-tokens` If stop sequences are given, they are ignored until minimum tokens are generated. Possible values: `≥ 0` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_MIN_NEW_TOKENS`	int	`0`
`quarkus.langchain4j.watsonx.generation-model.random-seed` Random number generator seed to use in sampling mode for experimental repeatability. Possible values: `≥ 1` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_RANDOM_SEED`	int
`quarkus.langchain4j.watsonx.generation-model.stop-sequences` Stop sequences are one or more strings which will cause the text generation to stop if/when they are produced as part of the output. Stop sequences encountered prior to the minimum number of tokens being generated will be ignored. Possible values: `0 ≤ number of items ≤ 6` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_STOP_SEQUENCES`	list of string
`quarkus.langchain4j.watsonx.generation-model.temperature` A value used to modify the next-token probabilities in `sampling` mode. Values less than `1.0` sharpen the probability distribution, resulting in "less random" output. Values greater than `1.0` flatten the probability distribution, resulting in "more random" output. A value of `1.0` has no effect. Possible values: `0 ≤ value ≤ 2` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_TEMPERATURE`	double	`${quarkus.langchain4j.temperature:1.0}`
`quarkus.langchain4j.watsonx.generation-model.top-k` The number of highest probability vocabulary tokens to keep for top-k-filtering. Only applies for `sampling` mode. When decoding_strategy is set to `sample`, only the `top_k` most likely tokens are considered as candidates for the next generated token. Possible values: `1 ≤ value ≤ 100` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_TOP_K`	int
`quarkus.langchain4j.watsonx.generation-model.top-p` Similar to `top_k` except the candidates to generate the next token are the most likely tokens with probabilities that add up to at least `top_p`. Also known as nucleus sampling. A value of `1.0` is equivalent to disabled. Possible values: `0 < value ≤ 1` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_TOP_P`	double
`quarkus.langchain4j.watsonx.generation-model.repetition-penalty` Represents the penalty for penalizing tokens that have already been generated or belong to the context. The value `1.0` means that there is no penalty. Possible values: `1 ≤ value ≤ 2` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_REPETITION_PENALTY`	double
`quarkus.langchain4j.watsonx.generation-model.truncate-input-tokens` Represents the maximum number of input tokens accepted. This can be used to avoid requests failing due to input being longer than configured limits. If the text is truncated, then it truncates the start of the input (on the left), so the end of the input will remain the same. If this value exceeds the maximum sequence length (refer to the documentation to find this value for the model) then the call will fail if the total number of tokens exceeds the maximum sequence length. Zero means don’t truncate. Possible values: `≥ 0` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_TRUNCATE_INPUT_TOKENS`	int
`quarkus.langchain4j.watsonx.generation-model.include-stop-sequence` Pass `false` to omit matched stop sequences from the end of the output text. The default is `true`, meaning that the output will end with the stop sequence text when matched. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_INCLUDE_STOP_SEQUENCE`	boolean
`quarkus.langchain4j.watsonx.generation-model.log-requests` Whether chat model requests should be logged. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.watsonx.generation-model.log-responses` Whether chat model responses should be logged. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.watsonx.generation-model.prompt-joiner` Delimiter used to concatenate the ChatMessage elements into a single string. By setting this property, you can define your preferred way of concatenating messages to ensure that the prompt is structured in the correct way. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_PROMPT_JOINER`	string	` `
`quarkus.langchain4j.watsonx.embedding-model.model-id` Specifies the ID of the model to be used. A list of all available models is provided in the IBM watsonx.ai documentation at the this link. To use a model, locate the `API model ID` column in the table and copy the corresponding model ID. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_EMBEDDING_MODEL_MODEL_ID`	string	`ibm/granite-embedding-278m-multilingual`
`quarkus.langchain4j.watsonx.embedding-model.truncate-input-tokens` Specifies the maximum number of input tokens accepted. This can be used to prevent requests from failing due to input exceeding the configured token limits. If the input exceeds the specified token limit, the input will be truncated from the end (right side), ensuring that the start of the input remains intact. If the provided value exceeds the model’s maximum sequence length (refer to the documentation for the model’s maximum sequence length), the request will fail if the total number of tokens exceeds the maximum limit. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_EMBEDDING_MODEL_TRUNCATE_INPUT_TOKENS`	int
`quarkus.langchain4j.watsonx.embedding-model.log-requests` Whether embedding model requests should be logged. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_EMBEDDING_MODEL_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.watsonx.embedding-model.log-responses` Whether embedding model responses should be logged. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_EMBEDDING_MODEL_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.watsonx.scoring-model.model-id` The id of the model to be used. All available models are listed in the IBM Watsonx.ai documentation at the link: following link. To use a model, locate the `API model_id` column in the table and copy the corresponding model ID. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_SCORING_MODEL_MODEL_ID`	string	`cross-encoder/ms-marco-minilm-l-12-v2`
`quarkus.langchain4j.watsonx.scoring-model.truncate-input-tokens` Specifies the maximum number of input tokens accepted. This helps to avoid requests failing due to input exceeding the configured token limits. If the input exceeds the specified token limit, the text will be truncated from the end (right side), ensuring that the start of the input remains intact. If the provided value exceeds the model’s maximum sequence length (refer to the documentation for the model’s maximum sequence length), the request will fail if the total number of tokens exceeds the maximum limit. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_SCORING_MODEL_TRUNCATE_INPUT_TOKENS`	int
`quarkus.langchain4j.watsonx.scoring-model.log-requests` Whether embedding model requests should be logged. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_SCORING_MODEL_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.watsonx.scoring-model.log-responses` Whether embedding model responses should be logged. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_SCORING_MODEL_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.watsonx.built-in-service.base-url` Base URL for the built-in service. All available URLs are listed in the IBM Watsonx.ai documentation at the following link. Note: If empty, the URL is automatically calculated based on the `watsonx.base-url` value. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_BUILT_IN_SERVICE_BASE_URL`	string
`quarkus.langchain4j.watsonx.built-in-service.api-key` IBM Cloud API key. If empty, the api key inherits the value from the `watsonx.api-key` property. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_BUILT_IN_SERVICE_API_KEY`	string
`quarkus.langchain4j.watsonx.built-in-service.timeout` Timeout for built-in tools APIs. If empty, the api key inherits the value from the `watsonx.timeout` property. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_BUILT_IN_SERVICE_TIMEOUT`	Duration	`10s`
`quarkus.langchain4j.watsonx.built-in-service.log-requests` Whether the built-in rest client should log requests. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_BUILT_IN_SERVICE_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.watsonx.built-in-service.log-responses` Whether the built-in rest client should log responses. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_BUILT_IN_SERVICE_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.watsonx.built-in-service.google-search.max-results` Maximum number of search results. Possible values: `1 < value < 20` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX_BUILT_IN_SERVICE_GOOGLE_SEARCH_MAX_RESULTS`	int	`10`
Named model config	Type	Default
`quarkus.langchain4j.watsonx."model-name".mode` Specifies the mode of interaction with the LLM model. This property allows you to choose between two modes of operation: chat: prompts are automatically enriched with the specific tags defined by the model generation: prompts require manual specification of tags Allowable values: `[chat, generation]` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__MODE`	string	`chat`
`quarkus.langchain4j.watsonx."model-name".base-url` Specifies the base URL of the watsonx.ai API. A list of all available URLs is provided in the IBM Watsonx.ai documentation at the this link. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__BASE_URL`	string
`quarkus.langchain4j.watsonx."model-name".api-key` IBM Cloud API key. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__API_KEY`	string
`quarkus.langchain4j.watsonx."model-name".timeout` Timeout for watsonx.ai calls. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__TIMEOUT`	Duration	`10s`
`quarkus.langchain4j.watsonx."model-name".version` The version date for the API of the form YYYY-MM-DD. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__VERSION`	string	`2025-04-23`
`quarkus.langchain4j.watsonx."model-name".space-id` The space that contains the resource. Either `space_id` or `project_id` has to be given. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__SPACE_ID`	string
`quarkus.langchain4j.watsonx."model-name".project-id` The project that contains the resource. Either `space_id` or `project_id` has to be given. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__PROJECT_ID`	string
`quarkus.langchain4j.watsonx."model-name".log-requests` Whether the watsonx.ai client should log requests. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.watsonx."model-name".log-responses` Whether the watsonx.ai client should log responses. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.watsonx."model-name".enable-integration` Whether to enable the integration. Defaults to `true`, which means requests are made to the watsonx.ai provider. Set to `false` to disable all requests. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__ENABLE_INTEGRATION`	boolean	`true`
`quarkus.langchain4j.watsonx."model-name".iam.base-url` Base URL of the IAM Authentication API. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__IAM_BASE_URL`	URL	`https://iam.cloud.ibm.com`
`quarkus.langchain4j.watsonx."model-name".iam.timeout` Timeout for IAM authentication calls. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__IAM_TIMEOUT`	Duration	`10s`
`quarkus.langchain4j.watsonx."model-name".iam.grant-type` Grant type for the IAM Authentication API. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__IAM_GRANT_TYPE`	string	`urn:ibm:params:oauth:grant-type:apikey`
`quarkus.langchain4j.watsonx."model-name".text-extraction.base-url` Base URL of the Cloud Object Storage API. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__TEXT_EXTRACTION_BASE_URL`	string	required
`quarkus.langchain4j.watsonx."model-name".text-extraction.document-reference.connection` The ID of the connection asset that contains the credentials required to access the data. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__TEXT_EXTRACTION_DOCUMENT_REFERENCE_CONNECTION`	string	required
`quarkus.langchain4j.watsonx."model-name".text-extraction.document-reference.bucket-name` The name of the bucket containing the input document. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__TEXT_EXTRACTION_DOCUMENT_REFERENCE_BUCKET_NAME`	string	required
`quarkus.langchain4j.watsonx."model-name".text-extraction.results-reference.connection` The ID of the connection asset used to store the extracted results. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__TEXT_EXTRACTION_RESULTS_REFERENCE_CONNECTION`	string	required
`quarkus.langchain4j.watsonx."model-name".text-extraction.results-reference.bucket-name` The name of the bucket where the output files will be written. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__TEXT_EXTRACTION_RESULTS_REFERENCE_BUCKET_NAME`	string	required
`quarkus.langchain4j.watsonx."model-name".text-extraction.log-requests` Whether the Cloud Object Storage client should log requests. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__TEXT_EXTRACTION_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.watsonx."model-name".text-extraction.log-responses` Whether the Cloud Object Storage client should log responses. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__TEXT_EXTRACTION_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.watsonx."model-name".chat-model.model-id` Specifies the model to use for the chat completion. A list of all available models is provided in the IBM watsonx.ai documentation at the this link. To use a model, locate the `API model ID` column in the table and copy the corresponding model ID. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_MODEL_ID`	string	`meta-llama/llama-4-maverick-17b-128e-instruct-fp8`
`quarkus.langchain4j.watsonx."model-name".chat-model.tool-choice` Specifies how the model should choose which tool to call during a request. This value can be: auto: The model decides whether and which tool to call automatically. required: The model must call one of the available tools. If `toolChoiceName` is set, this value is ignored. Setting this value influences the tool-calling behavior of the model when no specific tool is required. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_TOOL_CHOICE`	`auto`, `required`
`quarkus.langchain4j.watsonx."model-name".chat-model.tool-choice-name` Specifies the name of a specific tool that the model must call. When set, the model will be forced to call the specified tool. The name must exactly match one of the available tools defined for the service. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_TOOL_CHOICE_NAME`	string
`quarkus.langchain4j.watsonx."model-name".chat-model.frequency-penalty` Positive values penalize new tokens based on their existing frequency in the generated text, reducing the likelihood of the model repeating the same lines verbatim. Possible values: `-2 < value < 2` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_FREQUENCY_PENALTY`	double	`0`
`quarkus.langchain4j.watsonx."model-name".chat-model.logprobs` Specifies whether to return the log probabilities of the output tokens. If set to `true`, the response will include the log probability of each output token in the content of the message. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_LOGPROBS`	boolean	`false`
`quarkus.langchain4j.watsonx."model-name".chat-model.top-logprobs` An integer specifying the number of most likely tokens to return at each token position, each with an associated log probability. The option `logprobs` must be set to `true` if this parameter is used. Possible values: `0 ≤ value ≤ 20` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_TOP_LOGPROBS`	int
`quarkus.langchain4j.watsonx."model-name".chat-model.max-tokens` The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model’s context length. Set to 0 for the model’s configured max generated tokens. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_MAX_TOKENS`	int	`1024`
`quarkus.langchain4j.watsonx."model-name".chat-model.n` Specifies how many chat completion choices to generate for each input message. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_N`	int	`1`
`quarkus.langchain4j.watsonx."model-name".chat-model.presence-penalty` Applies a penalty to new tokens based on whether they already appear in the generated text so far, encouraging the model to introduce new topics rather than repeat itself. Possible values: `-2 < value < 2` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_PRESENCE_PENALTY`	double	`0`
`quarkus.langchain4j.watsonx."model-name".chat-model.seed` Random number generator seed to use in sampling mode for experimental repeatability. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_SEED`	int
`quarkus.langchain4j.watsonx."model-name".chat-model.stop` Defines one or more stop sequences that will cause the model to stop generating further tokens if any of them are encountered in the output. This allows control over where the model should end its response. If a stop sequence is encountered before the minimum number of tokens has been generated, it will be ignored. Possible values: `0 ≤ number of items ≤ 4` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_STOP`	list of string
`quarkus.langchain4j.watsonx."model-name".chat-model.temperature` Specifies the sampling temperature to use in the generation process. Higher values (e.g. `0.8`) make the output more random and diverse, while lower values (e.g. `0.2`) make the output more focused and deterministic. Possible values: `0 < value < 2` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_TEMPERATURE`	double	`${quarkus.langchain4j.temperature:1.0}`
`quarkus.langchain4j.watsonx."model-name".chat-model.top-p` An alternative to sampling with `temperature`, called nucleus sampling, where the model considers the results of the tokens with `top_p` probability mass. So `0.1` means only the tokens comprising the top 10% probability mass are considered. Possible values: `0 < value < 1` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_TOP_P`	double	`1`
`quarkus.langchain4j.watsonx."model-name".chat-model.response-format` Specifies the desired format for the model’s output. Allowable values: `[json_object]` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_RESPONSE_FORMAT`	string
`quarkus.langchain4j.watsonx."model-name".chat-model.log-requests` Whether chat model requests should be logged. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.watsonx."model-name".chat-model.log-responses` Whether chat model responses should be logged. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.watsonx."model-name".generation-model.model-id` The id of the model to be used. All available models are listed in the IBM Watsonx.ai documentation at the link: following link. To use a model, locate the `API model_id` column in the table and copy the corresponding model ID. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_MODEL_ID`	string	`meta-llama/llama-4-maverick-17b-128e-instruct-fp8`
`quarkus.langchain4j.watsonx."model-name".generation-model.decoding-method` Represents the strategy used for picking the tokens during generation of the output text. During text generation when parameter value is set to `greedy`, each successive token corresponds to the highest probability token given the text that has already been generated. This strategy can lead to repetitive results especially for longer output sequences. The alternative `sample` strategy generates text by picking subsequent tokens based on the probability distribution of possible next tokens defined by (i.e., conditioned on) the already-generated text and the `top_k` and `top_p` parameters. Allowable values: `[sample,greedy]` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_DECODING_METHOD`	string	`greedy`
`quarkus.langchain4j.watsonx."model-name".generation-model.length-penalty.decay-factor` Represents the factor of exponential decay. Larger values correspond to more aggressive decay. Possible values: `> 1` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_LENGTH_PENALTY_DECAY_FACTOR`	double
`quarkus.langchain4j.watsonx."model-name".generation-model.length-penalty.start-index` A number of generated tokens after which this should take effect. Possible values: `≥ 0` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_LENGTH_PENALTY_START_INDEX`	int
`quarkus.langchain4j.watsonx."model-name".generation-model.max-new-tokens` The maximum number of new tokens to be generated. The maximum supported value for this field depends on the model being used. How the "token" is defined depends on the tokenizer and vocabulary size, which in turn depends on the model. Often the tokens are a mix of full words and sub-words. Depending on the users plan, and on the model being used, there may be an enforced maximum number of new tokens. Possible values: `≥ 0` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_MAX_NEW_TOKENS`	int	`200`
`quarkus.langchain4j.watsonx."model-name".generation-model.min-new-tokens` If stop sequences are given, they are ignored until minimum tokens are generated. Possible values: `≥ 0` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_MIN_NEW_TOKENS`	int	`0`
`quarkus.langchain4j.watsonx."model-name".generation-model.random-seed` Random number generator seed to use in sampling mode for experimental repeatability. Possible values: `≥ 1` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_RANDOM_SEED`	int
`quarkus.langchain4j.watsonx."model-name".generation-model.stop-sequences` Stop sequences are one or more strings which will cause the text generation to stop if/when they are produced as part of the output. Stop sequences encountered prior to the minimum number of tokens being generated will be ignored. Possible values: `0 ≤ number of items ≤ 6` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_STOP_SEQUENCES`	list of string
`quarkus.langchain4j.watsonx."model-name".generation-model.temperature` A value used to modify the next-token probabilities in `sampling` mode. Values less than `1.0` sharpen the probability distribution, resulting in "less random" output. Values greater than `1.0` flatten the probability distribution, resulting in "more random" output. A value of `1.0` has no effect. Possible values: `0 ≤ value ≤ 2` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_TEMPERATURE`	double	`${quarkus.langchain4j.temperature:1.0}`
`quarkus.langchain4j.watsonx."model-name".generation-model.top-k` The number of highest probability vocabulary tokens to keep for top-k-filtering. Only applies for `sampling` mode. When decoding_strategy is set to `sample`, only the `top_k` most likely tokens are considered as candidates for the next generated token. Possible values: `1 ≤ value ≤ 100` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_TOP_K`	int
`quarkus.langchain4j.watsonx."model-name".generation-model.top-p` Similar to `top_k` except the candidates to generate the next token are the most likely tokens with probabilities that add up to at least `top_p`. Also known as nucleus sampling. A value of `1.0` is equivalent to disabled. Possible values: `0 < value ≤ 1` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_TOP_P`	double
`quarkus.langchain4j.watsonx."model-name".generation-model.repetition-penalty` Represents the penalty for penalizing tokens that have already been generated or belong to the context. The value `1.0` means that there is no penalty. Possible values: `1 ≤ value ≤ 2` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_REPETITION_PENALTY`	double
`quarkus.langchain4j.watsonx."model-name".generation-model.truncate-input-tokens` Represents the maximum number of input tokens accepted. This can be used to avoid requests failing due to input being longer than configured limits. If the text is truncated, then it truncates the start of the input (on the left), so the end of the input will remain the same. If this value exceeds the maximum sequence length (refer to the documentation to find this value for the model) then the call will fail if the total number of tokens exceeds the maximum sequence length. Zero means don’t truncate. Possible values: `≥ 0` Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_TRUNCATE_INPUT_TOKENS`	int
`quarkus.langchain4j.watsonx."model-name".generation-model.include-stop-sequence` Pass `false` to omit matched stop sequences from the end of the output text. The default is `true`, meaning that the output will end with the stop sequence text when matched. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_INCLUDE_STOP_SEQUENCE`	boolean
`quarkus.langchain4j.watsonx."model-name".generation-model.log-requests` Whether chat model requests should be logged. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.watsonx."model-name".generation-model.log-responses` Whether chat model responses should be logged. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.watsonx."model-name".generation-model.prompt-joiner` Delimiter used to concatenate the ChatMessage elements into a single string. By setting this property, you can define your preferred way of concatenating messages to ensure that the prompt is structured in the correct way. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_PROMPT_JOINER`	string	` `
`quarkus.langchain4j.watsonx."model-name".embedding-model.model-id` Specifies the ID of the model to be used. A list of all available models is provided in the IBM watsonx.ai documentation at the this link. To use a model, locate the `API model ID` column in the table and copy the corresponding model ID. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__EMBEDDING_MODEL_MODEL_ID`	string	`ibm/granite-embedding-278m-multilingual`
`quarkus.langchain4j.watsonx."model-name".embedding-model.truncate-input-tokens` Specifies the maximum number of input tokens accepted. This can be used to prevent requests from failing due to input exceeding the configured token limits. If the input exceeds the specified token limit, the input will be truncated from the end (right side), ensuring that the start of the input remains intact. If the provided value exceeds the model’s maximum sequence length (refer to the documentation for the model’s maximum sequence length), the request will fail if the total number of tokens exceeds the maximum limit. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__EMBEDDING_MODEL_TRUNCATE_INPUT_TOKENS`	int
`quarkus.langchain4j.watsonx."model-name".embedding-model.log-requests` Whether embedding model requests should be logged. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__EMBEDDING_MODEL_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.watsonx."model-name".embedding-model.log-responses` Whether embedding model responses should be logged. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__EMBEDDING_MODEL_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.watsonx."model-name".scoring-model.model-id` The id of the model to be used. All available models are listed in the IBM Watsonx.ai documentation at the link: following link. To use a model, locate the `API model_id` column in the table and copy the corresponding model ID. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__SCORING_MODEL_MODEL_ID`	string	`cross-encoder/ms-marco-minilm-l-12-v2`
`quarkus.langchain4j.watsonx."model-name".scoring-model.truncate-input-tokens` Specifies the maximum number of input tokens accepted. This helps to avoid requests failing due to input exceeding the configured token limits. If the input exceeds the specified token limit, the text will be truncated from the end (right side), ensuring that the start of the input remains intact. If the provided value exceeds the model’s maximum sequence length (refer to the documentation for the model’s maximum sequence length), the request will fail if the total number of tokens exceeds the maximum limit. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__SCORING_MODEL_TRUNCATE_INPUT_TOKENS`	int
`quarkus.langchain4j.watsonx."model-name".scoring-model.log-requests` Whether embedding model requests should be logged. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__SCORING_MODEL_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.watsonx."model-name".scoring-model.log-responses` Whether embedding model responses should be logged. Environment variable: `QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__SCORING_MODEL_LOG_RESPONSES`	boolean	`false`

Configuration property

Type

Default

quarkus.langchain4j.watsonx.chat-model.enabled

Whether the model should be enabled.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_ENABLED

boolean

true

quarkus.langchain4j.watsonx.embedding-model.enabled

Whether the embedding model should be enabled.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_EMBEDDING_MODEL_ENABLED

boolean

true

quarkus.langchain4j.watsonx.scoring-model.enabled

Whether the scoring model should be enabled.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_SCORING_MODEL_ENABLED

boolean

true

quarkus.langchain4j.watsonx.mode

Specifies the mode of interaction with the LLM model.

This property allows you to choose between two modes of operation:

chat: prompts are automatically enriched with the specific tags defined by the model
generation: prompts require manual specification of tags

Allowable values: [chat, generation]

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_MODE

string

chat

quarkus.langchain4j.watsonx.base-url

Specifies the base URL of the watsonx.ai API.

A list of all available URLs is provided in the IBM Watsonx.ai documentation at the this link.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_BASE_URL

string

quarkus.langchain4j.watsonx.api-key

IBM Cloud API key.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_API_KEY

string

quarkus.langchain4j.watsonx.timeout

Timeout for watsonx.ai calls.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_TIMEOUT

Duration

10s

quarkus.langchain4j.watsonx.version

The version date for the API of the form YYYY-MM-DD.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_VERSION

string

2025-04-23

quarkus.langchain4j.watsonx.space-id

The space that contains the resource.

Either space_id or project_id has to be given.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_SPACE_ID

string

quarkus.langchain4j.watsonx.project-id

The project that contains the resource.

Either space_id or project_id has to be given.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_PROJECT_ID

string

quarkus.langchain4j.watsonx.log-requests

Whether the watsonx.ai client should log requests.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_LOG_REQUESTS

boolean

false

quarkus.langchain4j.watsonx.log-responses

Whether the watsonx.ai client should log responses.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_LOG_RESPONSES

boolean

false

quarkus.langchain4j.watsonx.enable-integration

Whether to enable the integration. Defaults to true, which means requests are made to the watsonx.ai provider. Set to false to disable all requests.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_ENABLE_INTEGRATION

boolean

true

quarkus.langchain4j.watsonx.iam.base-url

Base URL of the IAM Authentication API.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_IAM_BASE_URL

URL

https://iam.cloud.ibm.com

quarkus.langchain4j.watsonx.iam.timeout

Timeout for IAM authentication calls.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_IAM_TIMEOUT

Duration

10s

quarkus.langchain4j.watsonx.iam.grant-type

Grant type for the IAM Authentication API.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_IAM_GRANT_TYPE

string

urn:ibm:params:oauth:grant-type:apikey

quarkus.langchain4j.watsonx.text-extraction.base-url

Base URL of the Cloud Object Storage API.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_TEXT_EXTRACTION_BASE_URL

string

required

quarkus.langchain4j.watsonx.text-extraction.document-reference.connection

The ID of the connection asset that contains the credentials required to access the data.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_TEXT_EXTRACTION_DOCUMENT_REFERENCE_CONNECTION

string

required

quarkus.langchain4j.watsonx.text-extraction.document-reference.bucket-name

The name of the bucket containing the input document.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_TEXT_EXTRACTION_DOCUMENT_REFERENCE_BUCKET_NAME

string

required

quarkus.langchain4j.watsonx.text-extraction.results-reference.connection

The ID of the connection asset used to store the extracted results.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_TEXT_EXTRACTION_RESULTS_REFERENCE_CONNECTION

string

required

quarkus.langchain4j.watsonx.text-extraction.results-reference.bucket-name

The name of the bucket where the output files will be written.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_TEXT_EXTRACTION_RESULTS_REFERENCE_BUCKET_NAME

string

required

quarkus.langchain4j.watsonx.text-extraction.log-requests

Whether the Cloud Object Storage client should log requests.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_TEXT_EXTRACTION_LOG_REQUESTS

boolean

false

quarkus.langchain4j.watsonx.text-extraction.log-responses

Whether the Cloud Object Storage client should log responses.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_TEXT_EXTRACTION_LOG_RESPONSES

boolean

false

quarkus.langchain4j.watsonx.chat-model.model-id

Specifies the model to use for the chat completion.

A list of all available models is provided in the IBM watsonx.ai documentation at the this link.

To use a model, locate the API model ID column in the table and copy the corresponding model ID.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_MODEL_ID

string

meta-llama/llama-4-maverick-17b-128e-instruct-fp8

quarkus.langchain4j.watsonx.chat-model.tool-choice

Specifies how the model should choose which tool to call during a request.

This value can be:

auto: The model decides whether and which tool to call automatically.
required: The model must call one of the available tools.

If toolChoiceName is set, this value is ignored.

Setting this value influences the tool-calling behavior of the model when no specific tool is required.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_TOOL_CHOICE

auto, required

quarkus.langchain4j.watsonx.chat-model.tool-choice-name

Specifies the name of a specific tool that the model must call.

When set, the model will be forced to call the specified tool. The name must exactly match one of the available tools defined for the service.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_TOOL_CHOICE_NAME

string

quarkus.langchain4j.watsonx.chat-model.frequency-penalty

Positive values penalize new tokens based on their existing frequency in the generated text, reducing the likelihood of the model repeating the same lines verbatim.

Possible values: -2 < value < 2

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_FREQUENCY_PENALTY

double

0

quarkus.langchain4j.watsonx.chat-model.logprobs

Specifies whether to return the log probabilities of the output tokens.

If set to true, the response will include the log probability of each output token in the content of the message.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_LOGPROBS

boolean

false

quarkus.langchain4j.watsonx.chat-model.top-logprobs

An integer specifying the number of most likely tokens to return at each token position, each with an associated log probability. The option logprobs must be set to true if this parameter is used.

Possible values: 0 ≤ value ≤ 20

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_TOP_LOGPROBS

int

quarkus.langchain4j.watsonx.chat-model.max-tokens

The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model’s context length. Set to 0 for the model’s configured max generated tokens.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_MAX_TOKENS

int

1024

quarkus.langchain4j.watsonx.chat-model.n

Specifies how many chat completion choices to generate for each input message.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_N

int

1

quarkus.langchain4j.watsonx.chat-model.presence-penalty

Applies a penalty to new tokens based on whether they already appear in the generated text so far, encouraging the model to introduce new topics rather than repeat itself.

Possible values: -2 < value < 2

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_PRESENCE_PENALTY

double

0

quarkus.langchain4j.watsonx.chat-model.seed

Random number generator seed to use in sampling mode for experimental repeatability.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_SEED

int

quarkus.langchain4j.watsonx.chat-model.stop

Defines one or more stop sequences that will cause the model to stop generating further tokens if any of them are encountered in the output.

This allows control over where the model should end its response. If a stop sequence is encountered before the minimum number of tokens has been generated, it will be ignored.

Possible values: 0 ≤ number of items ≤ 4

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_STOP

list of string

quarkus.langchain4j.watsonx.chat-model.temperature

Specifies the sampling temperature to use in the generation process.

Higher values (e.g. 0.8) make the output more random and diverse, while lower values (e.g. 0.2) make the output more focused and deterministic.

Possible values: 0 < value < 2

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_TEMPERATURE

double

${quarkus.langchain4j.temperature:1.0}

quarkus.langchain4j.watsonx.chat-model.top-p

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

Possible values: 0 < value < 1

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_TOP_P

double

1

quarkus.langchain4j.watsonx.chat-model.response-format

Specifies the desired format for the model’s output.

Allowable values: [json_object]

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_RESPONSE_FORMAT

string

quarkus.langchain4j.watsonx.chat-model.log-requests

Whether chat model requests should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_LOG_REQUESTS

boolean

false

quarkus.langchain4j.watsonx.chat-model.log-responses

Whether chat model responses should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_LOG_RESPONSES

boolean

false

quarkus.langchain4j.watsonx.generation-model.model-id

The id of the model to be used.

All available models are listed in the IBM Watsonx.ai documentation at the link: following link.

To use a model, locate the API model_id column in the table and copy the corresponding model ID.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_MODEL_ID

string

meta-llama/llama-4-maverick-17b-128e-instruct-fp8

quarkus.langchain4j.watsonx.generation-model.decoding-method

Represents the strategy used for picking the tokens during generation of the output text. During text generation when parameter value is set to greedy, each successive token corresponds to the highest probability token given the text that has already been generated. This strategy can lead to repetitive results especially for longer output sequences. The alternative sample strategy generates text by picking subsequent tokens based on the probability distribution of possible next tokens defined by (i.e., conditioned on) the already-generated text and the top_k and top_p parameters.

Allowable values: [sample,greedy]

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_DECODING_METHOD

string

greedy

quarkus.langchain4j.watsonx.generation-model.length-penalty.decay-factor

Represents the factor of exponential decay. Larger values correspond to more aggressive decay.

Possible values: > 1

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_LENGTH_PENALTY_DECAY_FACTOR

double

quarkus.langchain4j.watsonx.generation-model.length-penalty.start-index

A number of generated tokens after which this should take effect.

Possible values: ≥ 0

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_LENGTH_PENALTY_START_INDEX

int

quarkus.langchain4j.watsonx.generation-model.max-new-tokens

The maximum number of new tokens to be generated. The maximum supported value for this field depends on the model being used. How the "token" is defined depends on the tokenizer and vocabulary size, which in turn depends on the model. Often the tokens are a mix of full words and sub-words. Depending on the users plan, and on the model being used, there may be an enforced maximum number of new tokens.

Possible values: ≥ 0

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_MAX_NEW_TOKENS

int

200

quarkus.langchain4j.watsonx.generation-model.min-new-tokens

If stop sequences are given, they are ignored until minimum tokens are generated.

Possible values: ≥ 0

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_MIN_NEW_TOKENS

int

0

quarkus.langchain4j.watsonx.generation-model.random-seed

Random number generator seed to use in sampling mode for experimental repeatability.

Possible values: ≥ 1

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_RANDOM_SEED

int

quarkus.langchain4j.watsonx.generation-model.stop-sequences

Stop sequences are one or more strings which will cause the text generation to stop if/when they are produced as part of the output. Stop sequences encountered prior to the minimum number of tokens being generated will be ignored.

Possible values: 0 ≤ number of items ≤ 6

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_STOP_SEQUENCES

list of string

quarkus.langchain4j.watsonx.generation-model.temperature

A value used to modify the next-token probabilities in sampling mode. Values less than 1.0 sharpen the probability distribution, resulting in "less random" output. Values greater than 1.0 flatten the probability distribution, resulting in "more random" output. A value of 1.0 has no effect.

Possible values: 0 ≤ value ≤ 2

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_TEMPERATURE

double

${quarkus.langchain4j.temperature:1.0}

quarkus.langchain4j.watsonx.generation-model.top-k

The number of highest probability vocabulary tokens to keep for top-k-filtering. Only applies for sampling mode. When decoding_strategy is set to sample, only the top_k most likely tokens are considered as candidates for the next generated token.

Possible values: 1 ≤ value ≤ 100

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_TOP_K

int

quarkus.langchain4j.watsonx.generation-model.top-p

Similar to top_k except the candidates to generate the next token are the most likely tokens with probabilities that add up to at least top_p. Also known as nucleus sampling. A value of 1.0 is equivalent to disabled.

Possible values: 0 < value ≤ 1

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_TOP_P

double

quarkus.langchain4j.watsonx.generation-model.repetition-penalty

Represents the penalty for penalizing tokens that have already been generated or belong to the context. The value 1.0 means that there is no penalty.

Possible values: 1 ≤ value ≤ 2

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_REPETITION_PENALTY

double

quarkus.langchain4j.watsonx.generation-model.truncate-input-tokens

Represents the maximum number of input tokens accepted. This can be used to avoid requests failing due to input being longer than configured limits. If the text is truncated, then it truncates the start of the input (on the left), so the end of the input will remain the same. If this value exceeds the maximum sequence length (refer to the documentation to find this value for the model) then the call will fail if the total number of tokens exceeds the maximum sequence length. Zero means don’t truncate.

Possible values: ≥ 0

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_TRUNCATE_INPUT_TOKENS

int

quarkus.langchain4j.watsonx.generation-model.include-stop-sequence

Pass false to omit matched stop sequences from the end of the output text. The default is true, meaning that the output will end with the stop sequence text when matched.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_INCLUDE_STOP_SEQUENCE

boolean

quarkus.langchain4j.watsonx.generation-model.log-requests

Whether chat model requests should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_LOG_REQUESTS

boolean

false

quarkus.langchain4j.watsonx.generation-model.log-responses

Whether chat model responses should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_LOG_RESPONSES

boolean

false

quarkus.langchain4j.watsonx.generation-model.prompt-joiner

Delimiter used to concatenate the ChatMessage elements into a single string. By setting this property, you can define your preferred way of concatenating messages to ensure that the prompt is structured in the correct way.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_PROMPT_JOINER

string

` `

quarkus.langchain4j.watsonx.embedding-model.model-id

Specifies the ID of the model to be used.

A list of all available models is provided in the IBM watsonx.ai documentation at the this link.

To use a model, locate the API model ID column in the table and copy the corresponding model ID.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_EMBEDDING_MODEL_MODEL_ID

string

ibm/granite-embedding-278m-multilingual

quarkus.langchain4j.watsonx.embedding-model.truncate-input-tokens

Specifies the maximum number of input tokens accepted. This can be used to prevent requests from failing due to input exceeding the configured token limits.

If the input exceeds the specified token limit, the input will be truncated from the end (right side), ensuring that the start of the input remains intact. If the provided value exceeds the model’s maximum sequence length (refer to the documentation for the model’s maximum sequence length), the request will fail if the total number of tokens exceeds the maximum limit.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_EMBEDDING_MODEL_TRUNCATE_INPUT_TOKENS

int

quarkus.langchain4j.watsonx.embedding-model.log-requests

Whether embedding model requests should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_EMBEDDING_MODEL_LOG_REQUESTS

boolean

false

quarkus.langchain4j.watsonx.embedding-model.log-responses

Whether embedding model responses should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_EMBEDDING_MODEL_LOG_RESPONSES

boolean

false

quarkus.langchain4j.watsonx.scoring-model.model-id

The id of the model to be used.

All available models are listed in the IBM Watsonx.ai documentation at the link: following link.

To use a model, locate the API model_id column in the table and copy the corresponding model ID.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_SCORING_MODEL_MODEL_ID

string

cross-encoder/ms-marco-minilm-l-12-v2

quarkus.langchain4j.watsonx.scoring-model.truncate-input-tokens

Specifies the maximum number of input tokens accepted. This helps to avoid requests failing due to input exceeding the configured token limits.

If the input exceeds the specified token limit, the text will be truncated from the end (right side), ensuring that the start of the input remains intact. If the provided value exceeds the model’s maximum sequence length (refer to the documentation for the model’s maximum sequence length), the request will fail if the total number of tokens exceeds the maximum limit.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_SCORING_MODEL_TRUNCATE_INPUT_TOKENS

int

quarkus.langchain4j.watsonx.scoring-model.log-requests

Whether embedding model requests should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_SCORING_MODEL_LOG_REQUESTS

boolean

false

quarkus.langchain4j.watsonx.scoring-model.log-responses

Whether embedding model responses should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_SCORING_MODEL_LOG_RESPONSES

boolean

false

quarkus.langchain4j.watsonx.built-in-service.base-url

Base URL for the built-in service.

All available URLs are listed in the IBM Watsonx.ai documentation at the following link.

Note: If empty, the URL is automatically calculated based on the watsonx.base-url value.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_BUILT_IN_SERVICE_BASE_URL

string

quarkus.langchain4j.watsonx.built-in-service.api-key

IBM Cloud API key.

If empty, the api key inherits the value from the watsonx.api-key property.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_BUILT_IN_SERVICE_API_KEY

string

quarkus.langchain4j.watsonx.built-in-service.timeout

Timeout for built-in tools APIs.

If empty, the api key inherits the value from the watsonx.timeout property.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_BUILT_IN_SERVICE_TIMEOUT

Duration

10s

quarkus.langchain4j.watsonx.built-in-service.log-requests

Whether the built-in rest client should log requests.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_BUILT_IN_SERVICE_LOG_REQUESTS

boolean

false

quarkus.langchain4j.watsonx.built-in-service.log-responses

Whether the built-in rest client should log responses.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_BUILT_IN_SERVICE_LOG_RESPONSES

boolean

false

quarkus.langchain4j.watsonx.built-in-service.google-search.max-results

Maximum number of search results.

Possible values: 1 < value < 20

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_BUILT_IN_SERVICE_GOOGLE_SEARCH_MAX_RESULTS

int

10

Named model config

Type

Default

quarkus.langchain4j.watsonx."model-name".mode

Specifies the mode of interaction with the LLM model.

This property allows you to choose between two modes of operation:

chat: prompts are automatically enriched with the specific tags defined by the model
generation: prompts require manual specification of tags

Allowable values: [chat, generation]

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__MODE

string

chat

quarkus.langchain4j.watsonx."model-name".base-url

Specifies the base URL of the watsonx.ai API.

A list of all available URLs is provided in the IBM Watsonx.ai documentation at the this link.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__BASE_URL

string

quarkus.langchain4j.watsonx."model-name".api-key

IBM Cloud API key.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__API_KEY

string

quarkus.langchain4j.watsonx."model-name".timeout

Timeout for watsonx.ai calls.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__TIMEOUT

Duration

10s

quarkus.langchain4j.watsonx."model-name".version

The version date for the API of the form YYYY-MM-DD.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__VERSION

string

2025-04-23

quarkus.langchain4j.watsonx."model-name".space-id

The space that contains the resource.

Either space_id or project_id has to be given.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__SPACE_ID

string

quarkus.langchain4j.watsonx."model-name".project-id

The project that contains the resource.

Either space_id or project_id has to be given.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__PROJECT_ID

string

quarkus.langchain4j.watsonx."model-name".log-requests

Whether the watsonx.ai client should log requests.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__LOG_REQUESTS

boolean

false

quarkus.langchain4j.watsonx."model-name".log-responses

Whether the watsonx.ai client should log responses.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__LOG_RESPONSES

boolean

false

quarkus.langchain4j.watsonx."model-name".enable-integration

Whether to enable the integration. Defaults to true, which means requests are made to the watsonx.ai provider. Set to false to disable all requests.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__ENABLE_INTEGRATION

boolean

true

quarkus.langchain4j.watsonx."model-name".iam.base-url

Base URL of the IAM Authentication API.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__IAM_BASE_URL

URL

https://iam.cloud.ibm.com

quarkus.langchain4j.watsonx."model-name".iam.timeout

Timeout for IAM authentication calls.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__IAM_TIMEOUT

Duration

10s

quarkus.langchain4j.watsonx."model-name".iam.grant-type

Grant type for the IAM Authentication API.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__IAM_GRANT_TYPE

string

urn:ibm:params:oauth:grant-type:apikey

quarkus.langchain4j.watsonx."model-name".text-extraction.base-url

Base URL of the Cloud Object Storage API.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__TEXT_EXTRACTION_BASE_URL

string

required

quarkus.langchain4j.watsonx."model-name".text-extraction.document-reference.connection

The ID of the connection asset that contains the credentials required to access the data.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__TEXT_EXTRACTION_DOCUMENT_REFERENCE_CONNECTION

string

required

quarkus.langchain4j.watsonx."model-name".text-extraction.document-reference.bucket-name

The name of the bucket containing the input document.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__TEXT_EXTRACTION_DOCUMENT_REFERENCE_BUCKET_NAME

string

required

quarkus.langchain4j.watsonx."model-name".text-extraction.results-reference.connection

The ID of the connection asset used to store the extracted results.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__TEXT_EXTRACTION_RESULTS_REFERENCE_CONNECTION

string

required

quarkus.langchain4j.watsonx."model-name".text-extraction.results-reference.bucket-name

The name of the bucket where the output files will be written.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__TEXT_EXTRACTION_RESULTS_REFERENCE_BUCKET_NAME

string

required

quarkus.langchain4j.watsonx."model-name".text-extraction.log-requests

Whether the Cloud Object Storage client should log requests.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__TEXT_EXTRACTION_LOG_REQUESTS

boolean

false

quarkus.langchain4j.watsonx."model-name".text-extraction.log-responses

Whether the Cloud Object Storage client should log responses.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__TEXT_EXTRACTION_LOG_RESPONSES

boolean

false

quarkus.langchain4j.watsonx."model-name".chat-model.model-id

Specifies the model to use for the chat completion.

A list of all available models is provided in the IBM watsonx.ai documentation at the this link.

To use a model, locate the API model ID column in the table and copy the corresponding model ID.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_MODEL_ID

string

meta-llama/llama-4-maverick-17b-128e-instruct-fp8

quarkus.langchain4j.watsonx."model-name".chat-model.tool-choice

Specifies how the model should choose which tool to call during a request.

This value can be:

auto: The model decides whether and which tool to call automatically.
required: The model must call one of the available tools.

If toolChoiceName is set, this value is ignored.

Setting this value influences the tool-calling behavior of the model when no specific tool is required.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_TOOL_CHOICE

auto, required

quarkus.langchain4j.watsonx."model-name".chat-model.tool-choice-name

Specifies the name of a specific tool that the model must call.

When set, the model will be forced to call the specified tool. The name must exactly match one of the available tools defined for the service.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_TOOL_CHOICE_NAME

string

quarkus.langchain4j.watsonx."model-name".chat-model.frequency-penalty

Positive values penalize new tokens based on their existing frequency in the generated text, reducing the likelihood of the model repeating the same lines verbatim.

Possible values: -2 < value < 2

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_FREQUENCY_PENALTY

double

0

quarkus.langchain4j.watsonx."model-name".chat-model.logprobs

Specifies whether to return the log probabilities of the output tokens.

If set to true, the response will include the log probability of each output token in the content of the message.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_LOGPROBS

boolean

false

quarkus.langchain4j.watsonx."model-name".chat-model.top-logprobs

Possible values: 0 ≤ value ≤ 20

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_TOP_LOGPROBS

int

quarkus.langchain4j.watsonx."model-name".chat-model.max-tokens

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_MAX_TOKENS

int

1024

quarkus.langchain4j.watsonx."model-name".chat-model.n

Specifies how many chat completion choices to generate for each input message.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_N

int

1

quarkus.langchain4j.watsonx."model-name".chat-model.presence-penalty

Applies a penalty to new tokens based on whether they already appear in the generated text so far, encouraging the model to introduce new topics rather than repeat itself.

Possible values: -2 < value < 2

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_PRESENCE_PENALTY

double

0

quarkus.langchain4j.watsonx."model-name".chat-model.seed

Random number generator seed to use in sampling mode for experimental repeatability.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_SEED

int

quarkus.langchain4j.watsonx."model-name".chat-model.stop

Defines one or more stop sequences that will cause the model to stop generating further tokens if any of them are encountered in the output.

This allows control over where the model should end its response. If a stop sequence is encountered before the minimum number of tokens has been generated, it will be ignored.

Possible values: 0 ≤ number of items ≤ 4

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_STOP

list of string

quarkus.langchain4j.watsonx."model-name".chat-model.temperature

Specifies the sampling temperature to use in the generation process.

Higher values (e.g. 0.8) make the output more random and diverse, while lower values (e.g. 0.2) make the output more focused and deterministic.

Possible values: 0 < value < 2

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_TEMPERATURE

double

${quarkus.langchain4j.temperature:1.0}

quarkus.langchain4j.watsonx."model-name".chat-model.top-p

Possible values: 0 < value < 1

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_TOP_P

double

1

quarkus.langchain4j.watsonx."model-name".chat-model.response-format

Specifies the desired format for the model’s output.

Allowable values: [json_object]

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_RESPONSE_FORMAT

string

quarkus.langchain4j.watsonx."model-name".chat-model.log-requests

Whether chat model requests should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_LOG_REQUESTS

boolean

false

quarkus.langchain4j.watsonx."model-name".chat-model.log-responses

Whether chat model responses should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_LOG_RESPONSES

boolean

false

quarkus.langchain4j.watsonx."model-name".generation-model.model-id

The id of the model to be used.

All available models are listed in the IBM Watsonx.ai documentation at the link: following link.

To use a model, locate the API model_id column in the table and copy the corresponding model ID.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_MODEL_ID

string

meta-llama/llama-4-maverick-17b-128e-instruct-fp8

quarkus.langchain4j.watsonx."model-name".generation-model.decoding-method

Allowable values: [sample,greedy]

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_DECODING_METHOD

string

greedy

quarkus.langchain4j.watsonx."model-name".generation-model.length-penalty.decay-factor

Represents the factor of exponential decay. Larger values correspond to more aggressive decay.

Possible values: > 1

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_LENGTH_PENALTY_DECAY_FACTOR

double

quarkus.langchain4j.watsonx."model-name".generation-model.length-penalty.start-index

A number of generated tokens after which this should take effect.

Possible values: ≥ 0

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_LENGTH_PENALTY_START_INDEX

int

quarkus.langchain4j.watsonx."model-name".generation-model.max-new-tokens

Possible values: ≥ 0

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_MAX_NEW_TOKENS

int

200

quarkus.langchain4j.watsonx."model-name".generation-model.min-new-tokens

If stop sequences are given, they are ignored until minimum tokens are generated.

Possible values: ≥ 0

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_MIN_NEW_TOKENS

int

0

quarkus.langchain4j.watsonx."model-name".generation-model.random-seed

Random number generator seed to use in sampling mode for experimental repeatability.

Possible values: ≥ 1

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_RANDOM_SEED

int

quarkus.langchain4j.watsonx."model-name".generation-model.stop-sequences

Possible values: 0 ≤ number of items ≤ 6

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_STOP_SEQUENCES

list of string

quarkus.langchain4j.watsonx."model-name".generation-model.temperature

Possible values: 0 ≤ value ≤ 2

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_TEMPERATURE

double

${quarkus.langchain4j.temperature:1.0}

quarkus.langchain4j.watsonx."model-name".generation-model.top-k

Possible values: 1 ≤ value ≤ 100

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_TOP_K

int

quarkus.langchain4j.watsonx."model-name".generation-model.top-p

Possible values: 0 < value ≤ 1

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_TOP_P

double

quarkus.langchain4j.watsonx."model-name".generation-model.repetition-penalty

Represents the penalty for penalizing tokens that have already been generated or belong to the context. The value 1.0 means that there is no penalty.

Possible values: 1 ≤ value ≤ 2

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_REPETITION_PENALTY

double

quarkus.langchain4j.watsonx."model-name".generation-model.truncate-input-tokens

Possible values: ≥ 0

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_TRUNCATE_INPUT_TOKENS

int

quarkus.langchain4j.watsonx."model-name".generation-model.include-stop-sequence

Pass false to omit matched stop sequences from the end of the output text. The default is true, meaning that the output will end with the stop sequence text when matched.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_INCLUDE_STOP_SEQUENCE

boolean

quarkus.langchain4j.watsonx."model-name".generation-model.log-requests

Whether chat model requests should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_LOG_REQUESTS

boolean

false

quarkus.langchain4j.watsonx."model-name".generation-model.log-responses

Whether chat model responses should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_LOG_RESPONSES

boolean

false

quarkus.langchain4j.watsonx."model-name".generation-model.prompt-joiner

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_PROMPT_JOINER

string

` `

quarkus.langchain4j.watsonx."model-name".embedding-model.model-id

Specifies the ID of the model to be used.

A list of all available models is provided in the IBM watsonx.ai documentation at the this link.

To use a model, locate the API model ID column in the table and copy the corresponding model ID.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__EMBEDDING_MODEL_MODEL_ID

string

ibm/granite-embedding-278m-multilingual

quarkus.langchain4j.watsonx."model-name".embedding-model.truncate-input-tokens

Specifies the maximum number of input tokens accepted. This can be used to prevent requests from failing due to input exceeding the configured token limits.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__EMBEDDING_MODEL_TRUNCATE_INPUT_TOKENS

int

quarkus.langchain4j.watsonx."model-name".embedding-model.log-requests

Whether embedding model requests should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__EMBEDDING_MODEL_LOG_REQUESTS

boolean

false

quarkus.langchain4j.watsonx."model-name".embedding-model.log-responses

Whether embedding model responses should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__EMBEDDING_MODEL_LOG_RESPONSES

boolean

false

quarkus.langchain4j.watsonx."model-name".scoring-model.model-id

The id of the model to be used.

All available models are listed in the IBM Watsonx.ai documentation at the link: following link.

To use a model, locate the API model_id column in the table and copy the corresponding model ID.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__SCORING_MODEL_MODEL_ID

string

cross-encoder/ms-marco-minilm-l-12-v2

quarkus.langchain4j.watsonx."model-name".scoring-model.truncate-input-tokens

Specifies the maximum number of input tokens accepted. This helps to avoid requests failing due to input exceeding the configured token limits.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__SCORING_MODEL_TRUNCATE_INPUT_TOKENS

int

quarkus.langchain4j.watsonx."model-name".scoring-model.log-requests

Whether embedding model requests should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__SCORING_MODEL_LOG_REQUESTS

boolean

false

quarkus.langchain4j.watsonx."model-name".scoring-model.log-responses

Whether embedding model responses should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__SCORING_MODEL_LOG_RESPONSES

boolean

false

About the Duration format

To write duration values, use the standard java.time.Duration format. See the Duration#parse() Java API documentation for more information.

You can also use a simplified format, starting with a number:

If the value is only a number, it represents time in seconds.
If the value is a number followed by ms, it represents time in milliseconds.

In other cases, the simplified format is translated to the java.time.Duration format for parsing:

If the value is a number followed by h, m, or s, it is prefixed with PT.
If the value is a number followed by d, it is prefixed with P.