IBM watsonx.ai
You can develop generative AI solutions with foundation models in IBM watsonx.ai. You can use prompts to generate, classify, summarize, or extract content from your input text. Choose from IBM models or open source models from Hugging Face. You can tune foundation models to customize your prompt output or optimize inferencing performance.
Supported only for IBM watsonx as a service on IBM Cloud. |
Using watsonx.ai
To employ watsonx.ai LLMs, integrate the following dependency into your project:
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-watsonx</artifactId>
<version>0.18.0.CR1</version>
</dependency>
If no other extension is installed, AI Services will automatically utilize the configured watsonx dependency.
Configuration
To use the watsonx.ai dependency, you must configure some required values in the application.properties
file.
Base URL
The base-url
property depends on the region of the provided service instance, use one of the following values:
-
Frankfurt: https://eu-de.ml.cloud.ibm.com
-
London: https://eu-gb.ml.cloud.ibm.com
quarkus.langchain4j.watsonx.base-url=https://us-south.ml.cloud.ibm.com
Project ID
To prompt foundation models in watsonx.ai programmatically, you need to pass the identifier (ID) of a project.
To get the ID of a project, complete the following steps:
-
Open the project, and then click the Manage tab.
-
Copy the project ID from the Details section of the General page.
To view the list of projects, go to https://dataplatform.cloud.ibm.com/projects/?context=wx. |
quarkus.langchain4j.watsonx.project-id=23d...
API Key
To prompt foundation models in IBM watsonx.ai programmatically, you need an IBM Cloud API key.
quarkus.langchain4j.watsonx.api-key=hG-...
To determine the API key, go to https://cloud.ibm.com/iam/apikeys and generate it. |
Writing prompts
When creating prompts using watsonx.ai, it’s important to follow the guidelines of the model you choose. Depending on the model, some special instructions may be required to ensure the desired output. For best results, always refer to the documentation provided for each model to maximize the effectiveness of your prompts.
To simplify the process of prompt creation, you can use the prompt-formatter
property to automatically handle the addition of tags to your prompts. This property allows you to avoid manually adding tags by letting the system handle the formatting based on the model’s requirements. This functionality is particularly useful for models such as ibm/granite-13b-chat-v2
, meta-llama/llama-3-405b-instruct
, and other supported models, ensuring consistent and accurate prompt structures without additional effort.
To enable this functionality, configure the prompt-formatter
property in your application.properties
file as follows:
quarkus.langchain4j.watsonx.chat-model.prompt-formatter=true
When this property is set to true
, the system will automatically format prompts with the appropriate tags. This helps to maintain prompt clarity and improves interaction with the LLM by ensuring that prompts follow the required structure. If set to false
, you’ll need to manage the tags manually.
For example, if you choose to use ibm/granite-13b-chat-v2
without using the prompt-formatter
, you will need to manually add the <|system|>
, <|user|>
and <|assistant|>
instructions:
quarkus.langchain4j.watsonx.api-key=hG-...
quarkus.langchain4j.watsonx.base-url=https://us-south.ml.cloud.ibm.com
quarkus.langchain4j.watsonx.chat-model.model-id=ibm/granite-13b-chat-v2
quarkus.langchain4j.watsonx.chat-model.prompt-formatter=false
@RegisterAiService
public interface LLMService {
public record Result(Integer result) {}
@SystemMessage("""
<|system|>
You are a calculator and you must perform the mathematical operation
{response_schema}
""")
@UserMessage("""
<|user|>
{firstNumber} + {secondNumber}
<|assistant|>
""")
public Result calculator(int firstNumber, int secondNumber);
}
Enabling the prompt-formatter
will result in:
quarkus.langchain4j.watsonx.api-key=hG-...
quarkus.langchain4j.watsonx.base-url=https://us-south.ml.cloud.ibm.com
quarkus.langchain4j.watsonx.chat-model.model-id=ibm/granite-13b-chat-v2
quarkus.langchain4j.watsonx.chat-model.prompt-formatter=true
@RegisterAiService
public interface LLMService {
public record Result(Integer result) {}
@SystemMessage("""
You are a calculator and you must perform the mathematical operation
{response_schema}
""")
@UserMessage("""
{firstNumber} + {secondNumber}
""")
public Result calculator(int firstNumber, int secondNumber);
}
The prompt-formatter
supports the following models:
-
mistralai/mistral-large
-
mistralai/mixtral-8x7b-instruct-v01
-
sdaia/allam-1-13b-instruct
-
meta-llama/llama-3-405b-instruct
-
meta-llama/llama-3-1-70b-instruct
-
meta-llama/llama-3-1-8b-instruct
-
meta-llama/llama-3-70b-instruct
-
meta-llama/llama-3-8b-instruct
-
ibm/granite-13b-chat-v2
-
ibm/granite-13b-instruct-v2
-
ibm/granite-7b-lab
-
ibm/granite-20b-code-instruct
-
ibm/granite-34b-code-instruct
-
ibm/granite-3b-code-instruct
-
ibm/granite-8b-code-instruct
Tool Execution with Prompt Formatter
In addition to simplifying prompt creation, the prompt-formatter
property also enables the execution of tools for specific models. Tools allow for dynamic interactions within the model, enabling the AI to perform specific actions or fetch data as part of its response.
When the prompt-formatter
is enabled and a supported model is selected, the prompt will be automatically formatted to use the tools. More information about tools is available in the Agent and Tools page.
Currently, the following model supports tool execution:
-
mistralai/mistral-large
-
meta-llama/llama-3-405b-instruct
-
meta-llama/llama-3-1-70b-instruct
The @SystemMessage and @UserMessage annotations are joined by default with a new line. If you want to change this behavior, use the property quarkus.langchain4j.watsonx.chat-model.prompt-joiner=<value> . By adjusting this property, you can define your preferred way of joining messages and ensure that the prompt structure meets your specific needs. This customization option is available only when the prompt-formatter property is set to false . When the prompt-formatter is enabled (set to true ), the prompt formatting, including the addition of tags and message joining, is automatically handled. In this case, the prompt-joiner property will be ignored, and you will not have the ability to customize how messages are joined.
|
Sometimes it may be useful to use the quarkus.langchain4j.watsonx.chat-model.stop-sequences property to prevent the LLM model from returning more results than desired.
|
All configuration properties
Configuration property fixed at build time - All other configuration properties are overridable at runtime
Type |
Default |
|
---|---|---|
Whether the model should be enabled. Environment variable: |
boolean |
|
Whether the embedding model should be enabled. Environment variable: |
boolean |
|
Model id to use. To view the complete model list, click here. Environment variable: |
string |
|
Configuration property that enables or disables the functionality of the prompt formatter.
Environment variable: |
boolean |
|
Base URL of the watsonx.ai API. Environment variable: |
string |
|
IBM Cloud API key. To create a new API key, follow this link. Environment variable: |
string |
|
Timeout for watsonx.ai calls. Environment variable: |
|
|
The version date for the API of the form YYYY-MM-DD. Environment variable: |
string |
|
The project that contains the watsonx.ai resource. To look up your project id, click here. Environment variable: |
string |
|
Whether the watsonx.ai client should log requests. Environment variable: |
boolean |
|
Whether the watsonx.ai client should log responses. Environment variable: |
boolean |
|
Whether to enable the integration. Defaults to Environment variable: |
boolean |
|
Base URL of the IAM Authentication API. Environment variable: |
||
Timeout for IAM authentication calls. Environment variable: |
|
|
Grant type for the IAM Authentication API. Environment variable: |
string |
|
Represents the strategy used for picking the tokens during generation of the output text. During text generation when parameter value is set to Allowable values: Environment variable: |
string |
|
Represents the factor of exponential decay. Larger values correspond to more aggressive decay. Possible values: Environment variable: |
double |
|
A number of generated tokens after which this should take effect. Possible values: Environment variable: |
int |
|
The maximum number of new tokens to be generated. The maximum supported value for this field depends on the model being used. How the "token" is defined depends on the tokenizer and vocabulary size, which in turn depends on the model. Often the tokens are a mix of full words and sub-words. Depending on the users plan, and on the model being used, there may be an enforced maximum number of new tokens. Possible values: Environment variable: |
int |
|
If stop sequences are given, they are ignored until minimum tokens are generated. Possible values: Environment variable: |
int |
|
Random number generator seed to use in sampling mode for experimental repeatability. Possible values: Environment variable: |
int |
|
Stop sequences are one or more strings which will cause the text generation to stop if/when they are produced as part of the output. Stop sequences encountered prior to the minimum number of tokens being generated will be ignored. Possible values: Environment variable: |
list of string |
|
A value used to modify the next-token probabilities in Possible values: Environment variable: |
double |
|
The number of highest probability vocabulary tokens to keep for top-k-filtering. Only applies for Possible values: Environment variable: |
int |
|
Similar to Possible values: Environment variable: |
double |
|
Represents the penalty for penalizing tokens that have already been generated or belong to the context. The value Possible values: Environment variable: |
double |
|
Represents the maximum number of input tokens accepted. This can be used to avoid requests failing due to input being longer than configured limits. If the text is truncated, then it truncates the start of the input (on the left), so the end of the input will remain the same. If this value exceeds the maximum sequence length (refer to the documentation to find this value for the model) then the call will fail if the total number of tokens exceeds the maximum sequence length. Zero means don’t truncate. Possible values: Environment variable: |
int |
|
Pass Environment variable: |
boolean |
|
Whether chat model requests should be logged. Environment variable: |
boolean |
|
Whether chat model responses should be logged. Environment variable: |
boolean |
|
Delimiter used to concatenate the ChatMessage elements into a single string. By setting this property, you can define your preferred way of concatenating messages to ensure that the prompt is structured in the correct way. Environment variable: |
string |
` ` |
Model id to use. To view the complete model list, click here. Environment variable: |
string |
|
Whether embedding model requests should be logged. Environment variable: |
boolean |
|
Whether embedding model responses should be logged. Environment variable: |
boolean |
|
Type |
Default |
|
Model id to use. To view the complete model list, click here. Environment variable: |
string |
|
Configuration property that enables or disables the functionality of the prompt formatter.
Environment variable: |
boolean |
|
Base URL of the watsonx.ai API. Environment variable: |
string |
|
IBM Cloud API key. To create a new API key, follow this link. Environment variable: |
string |
|
Timeout for watsonx.ai calls. Environment variable: |
|
|
The version date for the API of the form YYYY-MM-DD. Environment variable: |
string |
|
The project that contains the watsonx.ai resource. To look up your project id, click here. Environment variable: |
string |
|
Whether the watsonx.ai client should log requests. Environment variable: |
boolean |
|
Whether the watsonx.ai client should log responses. Environment variable: |
boolean |
|
Whether to enable the integration. Defaults to Environment variable: |
boolean |
|
Base URL of the IAM Authentication API. Environment variable: |
||
Timeout for IAM authentication calls. Environment variable: |
|
|
Grant type for the IAM Authentication API. Environment variable: |
string |
|
Represents the strategy used for picking the tokens during generation of the output text. During text generation when parameter value is set to Allowable values: Environment variable: |
string |
|
Represents the factor of exponential decay. Larger values correspond to more aggressive decay. Possible values: Environment variable: |
double |
|
A number of generated tokens after which this should take effect. Possible values: Environment variable: |
int |
|
The maximum number of new tokens to be generated. The maximum supported value for this field depends on the model being used. How the "token" is defined depends on the tokenizer and vocabulary size, which in turn depends on the model. Often the tokens are a mix of full words and sub-words. Depending on the users plan, and on the model being used, there may be an enforced maximum number of new tokens. Possible values: Environment variable: |
int |
|
If stop sequences are given, they are ignored until minimum tokens are generated. Possible values: Environment variable: |
int |
|
Random number generator seed to use in sampling mode for experimental repeatability. Possible values: Environment variable: |
int |
|
Stop sequences are one or more strings which will cause the text generation to stop if/when they are produced as part of the output. Stop sequences encountered prior to the minimum number of tokens being generated will be ignored. Possible values: Environment variable: |
list of string |
|
A value used to modify the next-token probabilities in Possible values: Environment variable: |
double |
|
The number of highest probability vocabulary tokens to keep for top-k-filtering. Only applies for Possible values: Environment variable: |
int |
|
Similar to Possible values: Environment variable: |
double |
|
Represents the penalty for penalizing tokens that have already been generated or belong to the context. The value Possible values: Environment variable: |
double |
|
Represents the maximum number of input tokens accepted. This can be used to avoid requests failing due to input being longer than configured limits. If the text is truncated, then it truncates the start of the input (on the left), so the end of the input will remain the same. If this value exceeds the maximum sequence length (refer to the documentation to find this value for the model) then the call will fail if the total number of tokens exceeds the maximum sequence length. Zero means don’t truncate. Possible values: Environment variable: |
int |
|
Pass Environment variable: |
boolean |
|
Whether chat model requests should be logged. Environment variable: |
boolean |
|
Whether chat model responses should be logged. Environment variable: |
boolean |
|
Delimiter used to concatenate the ChatMessage elements into a single string. By setting this property, you can define your preferred way of concatenating messages to ensure that the prompt is structured in the correct way. Environment variable: |
string |
` ` |
Model id to use. To view the complete model list, click here. Environment variable: |
string |
|
Whether embedding model requests should be logged. Environment variable: |
boolean |
|
Whether embedding model responses should be logged. Environment variable: |
boolean |
|
About the Duration format
To write duration values, use the standard You can also use a simplified format, starting with a number:
In other cases, the simplified format is translated to the
|