IBM watsonx.ai

You can develop generative AI solutions with foundation models in IBM watsonx.ai. You can use prompts to generate, classify, summarize, or extract content from your input text. Choose from IBM models or open source models from Hugging Face. You can tune foundation models to customize your prompt output or optimize inferencing performance.

Supported only for IBM watsonx as a service on IBM Cloud.

Using watsonx.ai

To employ watsonx.ai LLMs, integrate the following dependency into your project:

<dependency>
    <groupId>io.quarkiverse.langchain4j</groupId>
    <artifactId>quarkus-langchain4j-watsonx</artifactId>
    <version>0.23.0.CR1</version>
</dependency>

If no other extension is installed, AI Services will automatically utilize the configured watsonx dependency.

Configuration

To use the watsonx.ai dependency, you must configure some required values in the application.properties file.

Base URL

The base-url property depends on the region of the provided service instance, use one of the following values:

quarkus.langchain4j.watsonx.base-url=https://us-south.ml.cloud.ibm.com

Project ID

To prompt foundation models in watsonx.ai programmatically, you need to pass the identifier (ID) of a project.

To get the ID of a project, complete the following steps:

  1. Open the project, and then click the Manage tab.

  2. Copy the project ID from the Details section of the General page.

To view the list of projects, go to https://dataplatform.cloud.ibm.com/projects/?context=wx.
quarkus.langchain4j.watsonx.project-id=23d...

API Key

To prompt foundation models in IBM watsonx.ai programmatically, you need an IBM Cloud API key.

quarkus.langchain4j.watsonx.api-key=hG-...
To determine the API key, go to https://cloud.ibm.com/iam/apikeys and generate it.

Interacting with Models

The watsonx.ai module provides two different modes for interacting with LLM models: generation and chat. These modes allow you to tailor the interaction based on the complexity of your use case and how much control you want to have over the prompt structure.

You can select the interaction mode using the property quarkus.langchain4j.watsonx.chat-model.mode.

  • generation: In this mode, you must explicitly structure the prompts using the required model-specific tags. This provides full control over the format of the prompt, but requires in-depth knowledge of the model being used. For best results, always refer to the documentation provided of each model to maximize the effectiveness of your prompts.

  • chat: This mode abstracts the complexity of tagging by automatically formatting prompts so you can focus on the content (default value).

To choose between one of these two modes, add the chat-model.mode property to your application.properties file:

quarkus.langchain4j.watsonx.mode=chat  // or 'generate'
Depending on the mode selected, the values for configuring the model are found under the chat-model or generation-model properties.

Chat Mode

In chat mode, you can interact with models without having to manually manage the tags of a prompt.

You might choose this mode if you are looking for dynamic interactions where the model can build on previous messages and provide more contextually relevant responses. This mode simplifies the interaction by automatically managing the necessary tags, allowing you to focus on the content of your prompts rather than formatting.

Chat mode also supports the use of tools, allowing the model to perform specific actions or retrieve external data as part of its responses. This extends the capabilities of the model, allowing it to perform complex tasks dynamically and adapt to your needs. More information about tools is available on the Agent and Tools page.

quarkus.langchain4j.watsonx.base-url=${BASE_URL}
quarkus.langchain4j.watsonx.api-key=${API_KEY}
quarkus.langchain4j.watsonx.project-id=${PROJECT_ID}
quarkus.langchain4j.watsonx.chat-model.model-id=mistralai/mistral-large
@RegisterAiService
public interface AiService {
    @SystemMessage("You are a helpful assistant")
    public String chat(@MemoryId String id, @UserMessage message);
}
The availability of chat and tools is currently limited to certain models. Not all models support these features, so be sure to consult the documentation for the specific model you are using to confirm whether these features are available.

Generation Mode

In generation mode, you have complete control over the structure of your prompts by manually specifying tags for a specific model. This mode could be useful in scenarios where a single-response is desired.

quarkus.langchain4j.watsonx.base-url=${BASE_URL}
quarkus.langchain4j.watsonx.api-key=${API_KEY}
quarkus.langchain4j.watsonx.project-id=${PROJECT_ID}
quarkus.langchain4j.watsonx.generation-model.model-id=mistralai/mistral-large
quarkus.langchain4j.watsonx.mode=generation
@RegisterAiService(chatMemoryProviderSupplier = RegisterAiService.NoChatMemoryProviderSupplier.class)
public interface AiService {
    @UserMessage("""
        <s>[INST] You are a helpful assistant [/INST]</s>\
        [INST] What is the capital of {capital}? [/INST]""")
    public String askCapital(String capital);
}
The @SystemMessage and @UserMessage annotations are joined by default with a new line. If you want to change this behavior, use the property quarkus.langchain4j.watsonx.chat-model.prompt-joiner=<value>. By adjusting this property, you can define your preferred way of joining messages and ensure that the prompt structure meets your specific needs.
Sometimes it may be useful to use the quarkus.langchain4j.watsonx.chat-model.stop-sequences property to prevent the LLM model from returning more results than desired.

All configuration properties

Configuration property fixed at build time - All other configuration properties are overridable at runtime

Configuration property

Type

Default

Whether the model should be enabled.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_ENABLED

boolean

true

Whether the embedding model should be enabled.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_EMBEDDING_MODEL_ENABLED

boolean

true

Whether the scoring model should be enabled.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_SCORING_MODEL_ENABLED

boolean

true

Specifies the mode of interaction with the LLM model.

This property allows you to choose between two modes of operation:

  • chat: prompts are automatically enriched with the specific tags defined by the model

  • generation: prompts require manual specification of tags Allowable values: [chat, generation]

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_MODE

string

chat

Base URL of the watsonx.ai API.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_BASE_URL

string

IBM Cloud API key.

To create a new API key, follow this link.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_API_KEY

string

Timeout for watsonx.ai calls.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_TIMEOUT

Duration

10s

The version date for the API of the form YYYY-MM-DD.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_VERSION

string

2024-03-14

The space that contains the resource. Either space_id or project_id has to be given.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_SPACE_ID

string

The space that contains the resource. Either space_id or project_id has to be given.

To look up your project id, click here.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_PROJECT_ID

string

Whether the watsonx.ai client should log requests.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_LOG_REQUESTS

boolean

false

Whether the watsonx.ai client should log responses.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_LOG_RESPONSES

boolean

false

Whether to enable the integration. Defaults to true, which means requests are made to the watsonx.ai provider. Set to false to disable all requests.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_ENABLE_INTEGRATION

boolean

true

Base URL of the IAM Authentication API.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_IAM_BASE_URL

URL

https://iam.cloud.ibm.com

Timeout for IAM authentication calls.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_IAM_TIMEOUT

Duration

10s

Grant type for the IAM Authentication API.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_IAM_GRANT_TYPE

string

urn:ibm:params:oauth:grant-type:apikey

Model id to use.

To view the complete model list, click here.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_MODEL_ID

string

mistralai/mistral-large

Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.

Possible values: -2 < value < 2

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_FREQUENCY_PENALTY

double

0

Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_LOGPROBS

boolean

false

An integer specifying the number of most likely tokens to return at each token position, each with an associated log probability. The option logprobs must be set to true if this parameter is used.

Possible values: 0 ≤ value ≤ 20 *

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_TOP_LOGPROBS

int

The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model’s context length.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_MAX_TOKENS

int

1024

How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_N

int

1

Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.

Possible values: -2 < value < 2

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_PRESENCE_PENALTY

double

0

What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

We generally recommend altering this or top_p but not both.

Possible values: 0 < value < 2

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_TEMPERATURE

double

${quarkus.langchain4j.temperature:1.0}

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.

Possible values: 0 < value < 1

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_TOP_P

double

1

Specifies the desired format for the model’s output.

Allowable values: [json_object] *

Applicable in modes: [chat]

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_RESPONSE_FORMAT

string

Whether chat model requests should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_LOG_REQUESTS

boolean

false

Whether chat model responses should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_LOG_RESPONSES

boolean

false

Model id to use.

To view the complete model list, click here.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_MODEL_ID

string

mistralai/mistral-large

Represents the strategy used for picking the tokens during generation of the output text. During text generation when parameter value is set to greedy, each successive token corresponds to the highest probability token given the text that has already been generated. This strategy can lead to repetitive results especially for longer output sequences. The alternative sample strategy generates text by picking subsequent tokens based on the probability distribution of possible next tokens defined by (i.e., conditioned on) the already-generated text and the top_k and top_p parameters.

Allowable values: [sample,greedy]

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_DECODING_METHOD

string

greedy

Represents the factor of exponential decay. Larger values correspond to more aggressive decay.

Possible values: > 1

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_LENGTH_PENALTY_DECAY_FACTOR

double

A number of generated tokens after which this should take effect.

Possible values: ≥ 0

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_LENGTH_PENALTY_START_INDEX

int

The maximum number of new tokens to be generated. The maximum supported value for this field depends on the model being used. How the "token" is defined depends on the tokenizer and vocabulary size, which in turn depends on the model. Often the tokens are a mix of full words and sub-words. Depending on the users plan, and on the model being used, there may be an enforced maximum number of new tokens.

Possible values: ≥ 0

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_MAX_NEW_TOKENS

int

200

If stop sequences are given, they are ignored until minimum tokens are generated.

Possible values: ≥ 0

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_MIN_NEW_TOKENS

int

0

Random number generator seed to use in sampling mode for experimental repeatability.

Possible values: ≥ 1

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_RANDOM_SEED

int

Stop sequences are one or more strings which will cause the text generation to stop if/when they are produced as part of the output. Stop sequences encountered prior to the minimum number of tokens being generated will be ignored.

Possible values: 0 ≤ number of items ≤ 6, contains only unique items

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_STOP_SEQUENCES

list of string

A value used to modify the next-token probabilities in sampling mode. Values less than 1.0 sharpen the probability distribution, resulting in "less random" output. Values greater than 1.0 flatten the probability distribution, resulting in "more random" output. A value of 1.0 has no effect.

Possible values: 0 ≤ value ≤ 2

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_TEMPERATURE

double

${quarkus.langchain4j.temperature:1.0}

The number of highest probability vocabulary tokens to keep for top-k-filtering. Only applies for sampling mode. When decoding_strategy is set to sample, only the top_k most likely tokens are considered as candidates for the next generated token.

Possible values: 1 ≤ value ≤ 100

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_TOP_K

int

Similar to top_k except the candidates to generate the next token are the most likely tokens with probabilities that add up to at least top_p. Also known as nucleus sampling. A value of 1.0 is equivalent to disabled.

Possible values: 0 < value ≤ 1

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_TOP_P

double

Represents the penalty for penalizing tokens that have already been generated or belong to the context. The value 1.0 means that there is no penalty.

Possible values: 1 ≤ value ≤ 2

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_REPETITION_PENALTY

double

Represents the maximum number of input tokens accepted. This can be used to avoid requests failing due to input being longer than configured limits. If the text is truncated, then it truncates the start of the input (on the left), so the end of the input will remain the same. If this value exceeds the maximum sequence length (refer to the documentation to find this value for the model) then the call will fail if the total number of tokens exceeds the maximum sequence length. Zero means don’t truncate.

Possible values: ≥ 0

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_TRUNCATE_INPUT_TOKENS

int

Pass false to omit matched stop sequences from the end of the output text. The default is true, meaning that the output will end with the stop sequence text when matched.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_INCLUDE_STOP_SEQUENCE

boolean

Whether chat model requests should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_LOG_REQUESTS

boolean

false

Whether chat model responses should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_LOG_RESPONSES

boolean

false

Delimiter used to concatenate the ChatMessage elements into a single string. By setting this property, you can define your preferred way of concatenating messages to ensure that the prompt is structured in the correct way.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_GENERATION_MODEL_PROMPT_JOINER

string

` `

Model id to use. To view the complete model list, click here.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_EMBEDDING_MODEL_MODEL_ID

string

ibm/slate-125m-english-rtrvr

Represents the maximum number of input tokens accepted. This can be used to avoid requests failing due to input being longer than configured limits. If the text is truncated, then it truncates the end of the input (on the right), so the start of the input will remain the same. If this value exceeds the maximum sequence length (refer to the documentation to find this value for the model) then the call will fail if the total number of tokens exceeds the maximum sequence length.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_EMBEDDING_MODEL_TRUNCATE_INPUT_TOKENS

int

Whether embedding model requests should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_EMBEDDING_MODEL_LOG_REQUESTS

boolean

false

Whether embedding model responses should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_EMBEDDING_MODEL_LOG_RESPONSES

boolean

false

Model id to use.

To view the complete model list, click here.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_SCORING_MODEL_MODEL_ID

string

cross-encoder/ms-marco-minilm-l-12-v2

Represents the maximum number of input tokens accepted. This can be used to avoid requests failing due to input being longer than configured limits. If the text is truncated, then it truncates the end of the input (on the right), so the start of the input will remain the same. If this value exceeds the maximum sequence length (refer to the documentation to find this value for the model) then the call will fail if the total number of tokens exceeds the maximum sequence length.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_SCORING_MODEL_TRUNCATE_INPUT_TOKENS

int

Whether embedding model requests should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_SCORING_MODEL_LOG_REQUESTS

boolean

false

Whether embedding model responses should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_SCORING_MODEL_LOG_RESPONSES

boolean

false

Named model config

Type

Default

Specifies the mode of interaction with the LLM model.

This property allows you to choose between two modes of operation:

  • chat: prompts are automatically enriched with the specific tags defined by the model

  • generation: prompts require manual specification of tags Allowable values: [chat, generation]

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__MODE

string

chat

Base URL of the watsonx.ai API.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__BASE_URL

string

IBM Cloud API key.

To create a new API key, follow this link.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__API_KEY

string

Timeout for watsonx.ai calls.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__TIMEOUT

Duration

10s

The version date for the API of the form YYYY-MM-DD.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__VERSION

string

2024-03-14

The space that contains the resource. Either space_id or project_id has to be given.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__SPACE_ID

string

The space that contains the resource. Either space_id or project_id has to be given.

To look up your project id, click here.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__PROJECT_ID

string

Whether the watsonx.ai client should log requests.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__LOG_REQUESTS

boolean

false

Whether the watsonx.ai client should log responses.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__LOG_RESPONSES

boolean

false

Whether to enable the integration. Defaults to true, which means requests are made to the watsonx.ai provider. Set to false to disable all requests.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__ENABLE_INTEGRATION

boolean

true

Base URL of the IAM Authentication API.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__IAM_BASE_URL

URL

https://iam.cloud.ibm.com

Timeout for IAM authentication calls.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__IAM_TIMEOUT

Duration

10s

Grant type for the IAM Authentication API.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__IAM_GRANT_TYPE

string

urn:ibm:params:oauth:grant-type:apikey

Model id to use.

To view the complete model list, click here.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_MODEL_ID

string

mistralai/mistral-large

Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.

Possible values: -2 < value < 2

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_FREQUENCY_PENALTY

double

0

Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_LOGPROBS

boolean

false

An integer specifying the number of most likely tokens to return at each token position, each with an associated log probability. The option logprobs must be set to true if this parameter is used.

Possible values: 0 ≤ value ≤ 20 *

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_TOP_LOGPROBS

int

The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model’s context length.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_MAX_TOKENS

int

1024

How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_N

int

1

Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.

Possible values: -2 < value < 2

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_PRESENCE_PENALTY

double

0

What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

We generally recommend altering this or top_p but not both.

Possible values: 0 < value < 2

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_TEMPERATURE

double

${quarkus.langchain4j.temperature:1.0}

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.

Possible values: 0 < value < 1

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_TOP_P

double

1

Specifies the desired format for the model’s output.

Allowable values: [json_object] *

Applicable in modes: [chat]

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_RESPONSE_FORMAT

string

Whether chat model requests should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_LOG_REQUESTS

boolean

false

Whether chat model responses should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_LOG_RESPONSES

boolean

false

Model id to use.

To view the complete model list, click here.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_MODEL_ID

string

mistralai/mistral-large

Represents the strategy used for picking the tokens during generation of the output text. During text generation when parameter value is set to greedy, each successive token corresponds to the highest probability token given the text that has already been generated. This strategy can lead to repetitive results especially for longer output sequences. The alternative sample strategy generates text by picking subsequent tokens based on the probability distribution of possible next tokens defined by (i.e., conditioned on) the already-generated text and the top_k and top_p parameters.

Allowable values: [sample,greedy]

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_DECODING_METHOD

string

greedy

Represents the factor of exponential decay. Larger values correspond to more aggressive decay.

Possible values: > 1

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_LENGTH_PENALTY_DECAY_FACTOR

double

A number of generated tokens after which this should take effect.

Possible values: ≥ 0

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_LENGTH_PENALTY_START_INDEX

int

The maximum number of new tokens to be generated. The maximum supported value for this field depends on the model being used. How the "token" is defined depends on the tokenizer and vocabulary size, which in turn depends on the model. Often the tokens are a mix of full words and sub-words. Depending on the users plan, and on the model being used, there may be an enforced maximum number of new tokens.

Possible values: ≥ 0

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_MAX_NEW_TOKENS

int

200

If stop sequences are given, they are ignored until minimum tokens are generated.

Possible values: ≥ 0

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_MIN_NEW_TOKENS

int

0

Random number generator seed to use in sampling mode for experimental repeatability.

Possible values: ≥ 1

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_RANDOM_SEED

int

Stop sequences are one or more strings which will cause the text generation to stop if/when they are produced as part of the output. Stop sequences encountered prior to the minimum number of tokens being generated will be ignored.

Possible values: 0 ≤ number of items ≤ 6, contains only unique items

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_STOP_SEQUENCES

list of string

A value used to modify the next-token probabilities in sampling mode. Values less than 1.0 sharpen the probability distribution, resulting in "less random" output. Values greater than 1.0 flatten the probability distribution, resulting in "more random" output. A value of 1.0 has no effect.

Possible values: 0 ≤ value ≤ 2

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_TEMPERATURE

double

${quarkus.langchain4j.temperature:1.0}

The number of highest probability vocabulary tokens to keep for top-k-filtering. Only applies for sampling mode. When decoding_strategy is set to sample, only the top_k most likely tokens are considered as candidates for the next generated token.

Possible values: 1 ≤ value ≤ 100

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_TOP_K

int

Similar to top_k except the candidates to generate the next token are the most likely tokens with probabilities that add up to at least top_p. Also known as nucleus sampling. A value of 1.0 is equivalent to disabled.

Possible values: 0 < value ≤ 1

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_TOP_P

double

Represents the penalty for penalizing tokens that have already been generated or belong to the context. The value 1.0 means that there is no penalty.

Possible values: 1 ≤ value ≤ 2

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_REPETITION_PENALTY

double

Represents the maximum number of input tokens accepted. This can be used to avoid requests failing due to input being longer than configured limits. If the text is truncated, then it truncates the start of the input (on the left), so the end of the input will remain the same. If this value exceeds the maximum sequence length (refer to the documentation to find this value for the model) then the call will fail if the total number of tokens exceeds the maximum sequence length. Zero means don’t truncate.

Possible values: ≥ 0

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_TRUNCATE_INPUT_TOKENS

int

Pass false to omit matched stop sequences from the end of the output text. The default is true, meaning that the output will end with the stop sequence text when matched.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_INCLUDE_STOP_SEQUENCE

boolean

Whether chat model requests should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_LOG_REQUESTS

boolean

false

Whether chat model responses should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_LOG_RESPONSES

boolean

false

Delimiter used to concatenate the ChatMessage elements into a single string. By setting this property, you can define your preferred way of concatenating messages to ensure that the prompt is structured in the correct way.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__GENERATION_MODEL_PROMPT_JOINER

string

` `

Model id to use. To view the complete model list, click here.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__EMBEDDING_MODEL_MODEL_ID

string

ibm/slate-125m-english-rtrvr

Represents the maximum number of input tokens accepted. This can be used to avoid requests failing due to input being longer than configured limits. If the text is truncated, then it truncates the end of the input (on the right), so the start of the input will remain the same. If this value exceeds the maximum sequence length (refer to the documentation to find this value for the model) then the call will fail if the total number of tokens exceeds the maximum sequence length.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__EMBEDDING_MODEL_TRUNCATE_INPUT_TOKENS

int

Whether embedding model requests should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__EMBEDDING_MODEL_LOG_REQUESTS

boolean

false

Whether embedding model responses should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__EMBEDDING_MODEL_LOG_RESPONSES

boolean

false

Model id to use.

To view the complete model list, click here.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__SCORING_MODEL_MODEL_ID

string

cross-encoder/ms-marco-minilm-l-12-v2

Represents the maximum number of input tokens accepted. This can be used to avoid requests failing due to input being longer than configured limits. If the text is truncated, then it truncates the end of the input (on the right), so the start of the input will remain the same. If this value exceeds the maximum sequence length (refer to the documentation to find this value for the model) then the call will fail if the total number of tokens exceeds the maximum sequence length.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__SCORING_MODEL_TRUNCATE_INPUT_TOKENS

int

Whether embedding model requests should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__SCORING_MODEL_LOG_REQUESTS

boolean

false

Whether embedding model responses should be logged.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__SCORING_MODEL_LOG_RESPONSES

boolean

false

About the Duration format

To write duration values, use the standard java.time.Duration format. See the Duration#parse() Java API documentation for more information.

You can also use a simplified format, starting with a number:

  • If the value is only a number, it represents time in seconds.

  • If the value is a number followed by ms, it represents time in milliseconds.

In other cases, the simplified format is translated to the java.time.Duration format for parsing:

  • If the value is a number followed by h, m, or s, it is prefixed with PT.

  • If the value is a number followed by d, it is prefixed with P.