IBM watsonx.ai Chat and Generation Models
IBM watsonx.ai enables the development of generative AI applications using foundation models from IBM and Hugging Face.
| This extension supports IBM watsonx as a service on IBM Cloud only. |
Prerequisites
To use watsonx.ai models, configure the following required values in your application.properties file:
Base URL
The base-url depends on the region of your service instance:
-
Frankfurt: https://eu-de.ml.cloud.ibm.com
-
London: https://eu-gb.ml.cloud.ibm.com
-
Sydney: https://au-syd.ml.cloud.ibm.com
-
Toronto: https://ca-tor.ml.cloud.ibm.com
-
Mumbai - https://ap-south-1.aws.wxai.ibm.com
quarkus.langchain4j.watsonx.base-url=https://us-south.ml.cloud.ibm.com
Project ID
Obtain the Project Id via:
-
Visit https://dataplatform.cloud.ibm.com/projects/?context=wx
-
Open your project and click the Manage tab.
-
Copy the Project ID from the Details section.
quarkus.langchain4j.watsonx.project-id=23d...
| You may use the optional space-id as an alternative. |
API Key
Create an API key by visiting https://cloud.ibm.com/iam/apikeys and clicking Create +.
quarkus.langchain4j.watsonx.api-key=your-api-key
You can also use the QUARKUS_LANGCHAIN4J_WATSONX_API_KEY environment variable.
|
Dependency
Add the following dependency to your project:
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>{provider-artifact}</artifactId>
<version>1.7.2</version>
</dependency>
Even better, if you use the Quarkus platformn BOM (default for projects generated), add the Quarkus Langchain4J BOM and all dependency versions will align:
<dependencyManagement>
<dependencies>
<dependency>
<groupId>${quarkus.platform.group-id}</groupId>
<artifactId>${quarkus.platform.artifact-id}</artifactId>
<version>${quarkus.platform.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<dependency>
<groupId>${quarkus.platform.group-id}</groupId>
<artifactId>quarkus-langchain4j-bom</artifactId> (1)
<version>${quarkus.platform.version}</version> (2)
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>{provider-artifact}</artifactId>
(3)
</dependency>
</dependencies>
| 1 | In your dependencyManagement section, add the quarkus-langchain4j-bom |
| 2 | Inherit the version from your platform version |
| 3 | Voilà, no need for version alignment anymore |
If no other extension is installed, AI Services will automatically use this provider.
Chat Model
IBM watsonx.ai provides a variety of foundation models for text generation, chat-based interactions, and instruction-following tasks.
These include both IBM-built models and third-party / community models.
Quarkus integrates the LangChain4j WatsonxChatModel, exposing it as ChatModel / StreamingChatModel bean.
See the full model catalog:
-
IBM foundation models: Model Catalog – IBM
-
Third-party / community foundation models: Model Catalog – Third-Party
Configuration
Configure the chat model in your application.properties:
quarkus.langchain4j.watsonx.mode=chat
Each mode has its own configuration namespace:
-
chat:
quarkus.langchain4j.watsonx.chat-model -
generation:
quarkus.langchain4j.watsonx.generation-model
Chat Mode Example
quarkus.langchain4j.watsonx.base-url=${BASE_URL}
quarkus.langchain4j.watsonx.api-key=${API_KEY}
quarkus.langchain4j.watsonx.project-id=${PROJECT_ID}
# Chat model
quarkus.langchain4j.watsonx.chat-model.model-name=ibm/granite-4-h-small
# Optional generation parameters
quarkus.langchain4j.watsonx.chat-model.max-output-tokens=0
quarkus.langchain4j.watsonx.chat-model.temperature=0.2
If a chat model is configured, Quarkus automatically registers a ChatModel / StreamingChatModel bean.
Enabling Thinking / Reasoning Output
Some foundation models can include internal reasoning (also referred to as thinking) steps as part of their responses.
Depending on the model, this reasoning may be embedded in the same text as the final response, or returned separately in a dedicated field from watsonx.ai.
To correctly enable and capture this behavior in Quarkus, you must configure the chat model with either thinking.tags (for ExtractionTags) or thinking.effort / thinking (for ThinkingEffort or boolean flag) in your application.properties.
This ensures that LangChain4j can automatically extract the reasoning and response content from the model output.
Models that return reasoning and response together
Use ExtractionTags when the model outputs reasoning and response in the same text string.
The tags define XML-like markers used to separate the reasoning from the final response.
# Example configuration for ibm/granite-3-3-8b-instruct
quarkus.langchain4j.watsonx.chat-model.model-name=ibm/granite-3-3-8b-instruct
quarkus.langchain4j.watsonx.chat-model.thinking.tags.think=think
quarkus.langchain4j.watsonx.chat-model.thinking.tags.response=response
Behavior
-
If both tags are specified, they are used directly to extract reasoning and response segments.
-
If only the reasoning tag is specified, everything outside that tag is considered the response.
@Inject
ChatModel thinkingChatModel;
var chatResponse = thinkingChatModel.chat(UserMessage.from("Why is the sky blue?"));
System.out.println(chatResponse.aiMessage().thinking());
System.out.println(chatResponse.aiMessage().text());
Models that return reasoning and response separately
For models that already return reasoning and response as separate fields, use the thinking.effort property to control how much reasoning the model applies during generation, or enable it using the boolean flag.
# Example configuration for openai/gpt-oss-120b
quarkus.langchain4j.watsonx.chat-model.model-name=openai/gpt-oss-120b
quarkus.langchain4j.watsonx.chat-model.thinking.effort=HIGH
Streaming Example
@Inject
StreamingChatModel streamingChatModel;
List<ChatMessage> messages = List.of(UserMessage.from("Why is the sky blue?"));
ChatRequest chatRequest = ChatRequest.builder()
.messages(messages)
.build();
streamingChatModel.chat(chatRequest, new StreamingChatResponseHandler() {
@Override
public void onPartialResponse(String partialResponse) {
System.out.println("Partial: " + partialResponse);
}
@Override
public void onPartialThinking(PartialThinking partialThinking) {
System.out.println("Thinking: " + partialThinking.content());
}
@Override
public void onCompleteResponse(ChatResponse completeResponse) {
System.out.println("Complete: " + completeResponse);
}
@Override
public void onError(Throwable error) {
error.printStackTrace();
}
});
|
Embedding Model
IBM watsonx.ai provides multiple embedding models for converting text into vector representations suitable for semantic search, RAG pipelines, similarity comparison, and vector database integrations.
Quarkus integrates the LangChain4j WatsonxEmbeddingModel, exposing it as EmbeddingModel bean.
A list of supported embedding models can be found here:
Configuration
Configure the embedding model by specifying its model name in application.properties:
# Base Watsonx configuration
quarkus.langchain4j.watsonx.base-url=${BASE_URL}
quarkus.langchain4j.watsonx.api-key=${API_KEY}
quarkus.langchain4j.watsonx.project-id=${PROJECT_ID}
# Embedding model configuration
quarkus.langchain4j.watsonx.embedding-model.model-name=ibm/slate-125m-english-rtrvr
If an embedding model is configured, Quarkus will automatically create and register
a EmbeddingModel bean.
Usage
Generating an embedding for a single text:
var response = embeddingModel.embed("Hello Watsonx!");
assertNotNull(response);
var embedding = response.content();
System.out.println("Embedding size: " + embedding.vector().length());
Generating embeddings for multiple text segments:
var embeddings = embeddingModel.embedAll(
List.of(
TextSegment.from("First document"),
TextSegment.from("Second document")
)
);
Scoring Model
IBM watsonx.ai provides scoring (reranking) models that evaluate the relevance between a query and a piece of text.
Quarkus integrates the LangChain4j WatsonxScoringModel, exposing it as ScoringModel implementation.
Scoring models are especially useful for RAG pipelines, document ranking, and semantic relevance evaluation.
A list of supported scoring/reranker models is available here:
Configuration
Configure the model by specifying its name in application.properties:
# Base Watsonx configuration
quarkus.langchain4j.watsonx.base-url=${BASE_URL}
quarkus.langchain4j.watsonx.api-key=${API_KEY}
quarkus.langchain4j.watsonx.project-id=${PROJECT_ID}
# Scoring model configuration
quarkus.langchain4j.watsonx.scoring-model.model-name=cross-encoder/ms-marco-minilm-l-12-v2
If an score model is configured, Quarkus will automatically create and register
a ScoringModel bean.
Usage
You can score a single text against a query:
var response = scoringModel.score("Rerank this!", "Test to rerank 1");
assertNotNull(response);
assertNotNull(response.content());
double score = response.content();
System.out.println("Score: " + score);
Or score multiple documents at once:
var scores = scoringModel.scoreAll(
List.of(
TextSegment.from("Document A"),
TextSegment.from("Document B")
),
"User query"
);
System.out.println(scores); // list of relevance scores
Moderation Model
IBM watsonx.ai provides moderation capabilities through multiple detectors that can identify unsafe, sensitive, or policy-violating content.
Quarkus integrates the LangChain4j WatsonxModerationModel, exposing each detector type as a dedicated configuration group.
Supported detector types include:
-
PII – Detects Personally Identifiable Information
-
HAP – Detects hate, abuse, or profanity
-
Granite Guardian – Detects harmful or risky content
Each detector can be enabled individually.
Configuration
Enable detectors in application.properties using their dedicated flags:
# Base Watsonx configuration
quarkus.langchain4j.watsonx.base-url=${BASE_URL}
quarkus.langchain4j.watsonx.api-key=${API_KEY}
quarkus.langchain4j.watsonx.project-id=${PROJECT_ID}
# Enable specific moderation detectors
quarkus.langchain4j.watsonx.moderation-model.hap.enabled=true
quarkus.langchain4j.watsonx.moderation-model.pii.enabled=true
quarkus.langchain4j.watsonx.moderation-model.granite-guardian.enabled=true
Each detector configuration group may also expose additional settings depending on its capabilities.
If an score model is configured, Quarkus will automatically create and register
a ModerationModel bean.
Usage
var response = moderationModel.moderate("Some text to analyze");
boolean flagged = response.content().flagged();
Map<String, Object> metadata = response.metadata();
System.out.println("Flagged? " + flagged);
System.out.println("Metadata: " + metadata);
Metadata
A moderation response includes metadata describing the detection:
| Key | Description |
|---|---|
detection |
The assigned label for the detected content |
detection_type |
Detector that triggered the flag |
start |
Start index of the detected segment |
end |
End index of the detected segment |
score |
Confidence score |
Example:
System.out.println(metadata.get("detection_type"));
System.out.println(metadata.get("score"));
Text Extraction
The TextExtraction feature enables developers to extract text from high-value business documents stored in IBM Cloud Object Storage. Extracted text can be used for AI processing, key information identification, or further document analysis.
The API supports text extraction from the following file types:
-
PDF
-
GIF
-
JPG
-
PNG
-
TIFF
-
BMP
-
DOC
-
DOCX
-
HTML
-
JFIF
-
PPT
-
PPTX
The extracted text can be output in the following formats:
-
JSON
-
MARKDOWN
-
HTML
-
PLAIN_TEXT
-
PAGE_IMAGES
Configuration
To enable TextExtraction in your application, configure the following properties:
quarkus.langchain4j.watsonx.base-url=${BASE_URL}
quarkus.langchain4j.watsonx.api-key=${API_KEY}
quarkus.langchain4j.watsonx.project-id=${PROJECT_ID}
quarkus.langchain4j.watsonx.text-extraction.cos-url=<base-url>
quarkus.langchain4j.watsonx.text-extraction.document-reference.connection=<connection-id>
quarkus.langchain4j.watsonx.text-extraction.document-reference.bucket-name=<bucket-name>
quarkus.langchain4j.watsonx.text-extraction.results-reference.connection=<connection-id>
quarkus.langchain4j.watsonx.text-extraction.results-reference.bucket-name=<bucket-name>
-
cos-url: The endpoint where the IBM Cloud Object Storage instance is deployed. To find the appropriate value, refer to the IBM Cloud Object Storage endpoint table. -
document-reference.connection: The connection asset ID containing credentials to access the source storage. -
document-reference.bucket-name: The bucket where documents to be processed will be uploaded. -
results-reference.connection: The connection asset ID containing credentials to access the output storage. -
results-reference.bucket-name: The bucket where extracted text documents will be saved as new files.
The document reference properties define the source storage for input and uloaded files, while the results reference properties specify where the extracted content is stored. Both can refer to the same bucket or different ones.
For more information on how to get the connection parameter for the document-reference and results-reference you can refer to the documentation at this link.
|
Using Text Extraction
The TextExtraction class provides multiple methods for extracting text from documents. You can either extract text from an existing file in IBM Cloud Object Storage or upload a file and extract its content. To use TextExtraction, you need to inject an instance into your application. If multiple configurations are defined, you can specify the appropriate one using the @ModelName qualifier.
@Inject
TextExtractionService textExtraction;
@Inject
@ModelName("custom")
TextExtractionService customTextExtraction;
You can start the extraction process in two ways.
First, if the document is already stored in IBM Cloud Object Storage, you can initiate the extraction by using the following method:
TextExtractionResponse response = textExtraction.startExtraction("path/to/document");
String id = response.metadata().id();
Alternatively, if you’re working with a local file, you can upload it and start the extraction process with:
File file = new File("path/to/document");
File response = textExtraction.uploadAndStartExtraction(file);
String id = response.metadata().id();
After starting the extraction, you can check its status by calling:
TextExtractionResponse response = textExtraction.fetchExtractionRequest(extractionId);
String result = response.entity().results().status();
If you need to extract and retrieve the text immediately, you have two options.
You can either extract text from an existing file directly:
String extractedText = textExtraction.extractAndFetch("path/to/document");
Or upload the file and retrieve the extracted text immediately:
File file = new File("path/to/document");
String extractedText = textExtraction.uploadExtractAndFetch(file);
All extraction methods can accept a Parameters object to customize the behavior of the text extraction request.
The Parameters object allows fine-grained control over the extraction process.
var parameters = TextExtractionParameters.builder()
.removeOutputFile(true)
.removeUploadedFile(true)
.requestedOutputs(MD)
.mode(Mode.HIGH_QUALITY)
.autoRotationCorrection(false)
.outputDpi(16)
.build()
File file = new File("path/to/document.pdf");
String extractedText = textExtraction.uploadExtractAndFetch(file, parameters));
Text Classification
The TextClassification feature enables you to classify text in your documents to identify whether the data in your file matches the key-value pair format in schema definitions for various document types.
By pre-processing the document, you can quickly verify whether a document is classified into one of the pre-defined schemas or a custom schema without performing key-value pair extraction, which can be a longer, resource-intensive process. You can then decide which schema to use to correctly extract text into fields in a key-value pair format.
The API supports text classification from the following file types:
-
BMP
-
DOC
-
DOCX
-
GIF
-
HTML
-
JFIF
-
JPG
-
MARKDOWN
-
PDF
-
PNG
-
PPT
-
PPTX
-
TIFF
-
XLSX
Configuration
To enable TextClassification in your application, configure the following properties:
quarkus.langchain4j.watsonx.base-url=${BASE_URL}
quarkus.langchain4j.watsonx.api-key=${API_KEY}
quarkus.langchain4j.watsonx.project-id=${PROJECT_ID}
quarkus.langchain4j.watsonx.text-classification.cos-url=<base-url>
quarkus.langchain4j.watsonx.text-classification.document-reference.connection=<connection-id>
quarkus.langchain4j.watsonx.text-classification.document-reference.bucket-name=<bucket-name>
-
cos-url: The endpoint where the IBM Cloud Object Storage instance is deployed. To find the appropriate value, refer to the IBM Cloud Object Storage endpoint table. -
document-reference.connection: The connection asset ID containing credentials to access the source storage. -
document-reference.bucket-name: The bucket where documents to be processed will be uploaded (or are already stored).
For more information on how to get the connection parameter for the document-reference you can refer to the documentation at this link.
|
Using Text Classification
The TextClassificationService class provides multiple methods for classifying documents. You can either classify text from an existing file in IBM Cloud Object Storage or upload a file and classify its content. To use TextClassificationService, you need to inject an instance into your application. If multiple configurations are defined, you can specify the appropriate one using the @ModelName qualifier.
@Inject
TextClassificationService classificationService;
@Inject
@ModelName("custom")
TextClassificationService customClassificationService;
You can start the classification process in two ways.
First, if the document is already stored in IBM Cloud Object Storage, you can initiate the classification by using the following method:
TextClassificationResponse response = classificationService.startClassification("path/to/document");
String id = response.metadata().id();
Alternatively, if you’re working with a local file, you can upload it and start the classification process with:
File file = new File("path/to/document");
TextClassificationResponse response = classificationService.uploadAndStartClassification(file);
String id = response.metadata().id();
After starting the classification, you can check its status by calling:
TextClassificationResponse response = classificationService.fetchClassificationRequest(classificationId);
String result = response.entity().results().status();
If you need to classify and retrieve the results immediately, you have two options.
You can either classify an existing file directly:
ClassificationResult result = classificationService.classifyAndFetch("path/to/document");
Or upload the file and retrieve the classification result immediately:
File file = new File("path/to/document");
ClassificationResult result = classificationService.uploadClassifyAndFetch(file);
All classification methods can accept a TextClassificationParameters object to customize the behavior of the request.
The TextClassificationParameters object allows fine-grained control over the classification process, including Classification Modes, OCR settings, and Semantic Configuration.
var parameters = TextClassificationParameters.builder()
.classificationMode(ClassificationMode.EXACT)
.languages(Language.ENGLISH, Language.FRENCH)
.ocrMode(OcrMode.AUTO)
.autoRotationCorrection(true)
.removeUploadedFile(true)
.build();
File file = new File("path/to/document.pdf");
ClassificationResult result = classificationService.uploadClassifyAndFetch(file, parameters));
Semantic Configuration
You can provide a TextClassificationSemanticConfig to the parameters. This allows you to define custom schemas, enabling the service to identify specific document types based on the presence of key-value pair fields you define.
The following example shows how to configure the service to classify a document as a specific "Invoice" type:
// 1. Define the fields expected in the document
var fields = KvpFields.builder()
.add("invoice_date", KvpField.of("The date when the invoice was issued.", "2024-07-10"))
.add("invoice_number", KvpField.of("The unique number identifying the invoice.", "INV-2024-001"))
.add("total_amount", KvpField.of("The total amount to be paid.", "1250.50"))
.build();
// 2. Define the Schema using the fields
var mySchema = Schema.builder()
.documentDescription("A vendor-issued invoice listing purchased items, prices, and payment information.")
.documentType("Invoice")
.fields(fields)
.build();
// 3. Create the Semantic Configuration
var semanticConfig = TextClassificationSemanticConfig.builder()
.schemasMergeStrategy(SchemaMergeStrategy.REPLACE)
.schemas(mySchema)
.build();
// 4. Pass the configuration to the parameters
var parameters = TextClassificationParameters.builder()
.languages(Language.ENGLISH)
.semanticConfig(semanticConfig)
.build();
ClassificationResult result = classificationService.uploadClassifyAndFetch(file, parameters);
Managing Requests and Files
The service also provides utility methods to manage the lifecycle of your requests and files:
// Delete a classification request history
classificationService.deleteRequest(requestId,
TextClassificationDeleteParameters.builder().hardDelete(true).build());
// Delete a file from the bucket
classificationService.deleteFile("bucket-name", "filename.pdf");
Configuration
Configuration property fixed at build time - All other configuration properties are overridable at runtime
Configuration property |
Type |
Default |
|---|---|---|
Whether the model should be enabled. Environment variable: |
boolean |
|
Whether the embedding model should be enabled. Environment variable: |
boolean |
|
Whether the scoring model should be enabled. Environment variable: |
boolean |
|
Specifies the mode of interaction with the LLM model. This property allows you to choose between two modes of operation:
Allowable values: Environment variable: |
string |
|
Specifies the base URL of the watsonx.ai API. A list of all available URLs is provided in the IBM Watsonx.ai documentation at the this link. Environment variable: |
string |
|
IBM Cloud API key. Environment variable: |
string |
|
Timeout for watsonx.ai calls. Environment variable: |
|
|
The version date for the API of the form YYYY-MM-DD. Environment variable: |
string |
|
The space that contains the resource. Either Environment variable: |
string |
|
The project that contains the resource. Either Environment variable: |
string |
|
Whether the watsonx.ai client should log requests. Environment variable: |
boolean |
|
Whether the watsonx.ai client should log responses. Environment variable: |
boolean |
|
Whether the watsonx.ai client should log requests as cURL commands. Environment variable: |
boolean |
|
Whether to enable the integration. Defaults to Environment variable: |
boolean |
|
Base URL of the IAM Authentication API. Environment variable: |
|
|
Timeout for IAM authentication calls. Environment variable: |
|
|
Grant type for the IAM Authentication API. Environment variable: |
string |
|
Base URL of the Cloud Object Storage API. Environment variable: |
string |
required |
The ID of the connection asset that contains the credentials required to access the data. Environment variable: |
string |
required |
The name of the bucket containing the input document. Environment variable: |
string |
required |
The ID of the connection asset used to store the extracted results. Environment variable: |
string |
required |
The name of the bucket where the output files will be written. Environment variable: |
string |
required |
Whether the Cloud Object Storage client should log requests. Environment variable: |
boolean |
|
Whether the Cloud Object Storage client should log responses. Environment variable: |
boolean |
|
Specifies the model to use for the chat completion. A list of all available models is provided in the IBM watsonx.ai documentation at the this link. To use a model, locate the Environment variable: |
string |
|
Specifies how the model should choose which tool to call during a request. This value can be:
If Setting this value influences the tool-calling behavior of the model when no specific tool is required. Environment variable: |
|
|
Specifies the name of a specific tool that the model must call. When set, the model will be forced to call the specified tool. The name must exactly match one of the available tools defined for the service. Environment variable: |
string |
|
Positive values penalize new tokens based on their existing frequency in the generated text, reducing the likelihood of the model repeating the same lines verbatim. Possible values: Environment variable: |
double |
|
Specifies whether to return the log probabilities of the output tokens. If set to Environment variable: |
boolean |
|
An integer specifying the number of most likely tokens to return at each token position, each with an associated log probability. The option Possible values: Environment variable: |
int |
|
The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model’s context length. Set to 0 for the model’s configured max generated tokens. Environment variable: |
int |
|
Specifies how many chat completion choices to generate for each input message. Environment variable: |
int |
|
Applies a penalty to new tokens based on whether they already appear in the generated text so far, encouraging the model to introduce new topics rather than repeat itself. Possible values: Environment variable: |
double |
|
Random number generator seed to use in sampling mode for experimental repeatability. Environment variable: |
int |
|
Defines one or more stop sequences that will cause the model to stop generating further tokens if any of them are encountered in the output. This allows control over where the model should end its response. If a stop sequence is encountered before the minimum number of tokens has been generated, it will be ignored. Possible values: Environment variable: |
list of string |
|
Specifies the sampling temperature to use in the generation process. Higher values (e.g. Possible values: Environment variable: |
double |
|
An alternative to sampling with Possible values: Environment variable: |
double |
|
Specifies the desired format for the model’s output. Allowable values: Environment variable: |
string |
|
Whether chat model requests should be logged. Environment variable: |
boolean |
|
Whether chat model responses should be logged. Environment variable: |
boolean |
|
The id of the model to be used. All available models are listed in the IBM Watsonx.ai documentation at the link: following link. To use a model, locate the Environment variable: |
string |
|
Represents the strategy used for picking the tokens during generation of the output text. During text generation when parameter value is set to Allowable values: Environment variable: |
string |
|
Represents the factor of exponential decay. Larger values correspond to more aggressive decay. Possible values: Environment variable: |
double |
|
A number of generated tokens after which this should take effect. Possible values: Environment variable: |
int |
|
The maximum number of new tokens to be generated. The maximum supported value for this field depends on the model being used. How the "token" is defined depends on the tokenizer and vocabulary size, which in turn depends on the model. Often the tokens are a mix of full words and sub-words. Depending on the users plan, and on the model being used, there may be an enforced maximum number of new tokens. Possible values: Environment variable: |
int |
|
If stop sequences are given, they are ignored until minimum tokens are generated. Possible values: Environment variable: |
int |
|
Random number generator seed to use in sampling mode for experimental repeatability. Possible values: Environment variable: |
int |
|
Stop sequences are one or more strings which will cause the text generation to stop if/when they are produced as part of the output. Stop sequences encountered prior to the minimum number of tokens being generated will be ignored. Possible values: Environment variable: |
list of string |
|
A value used to modify the next-token probabilities in Possible values: Environment variable: |
double |
|
The number of highest probability vocabulary tokens to keep for top-k-filtering. Only applies for Possible values: Environment variable: |
int |
|
Similar to Possible values: Environment variable: |
double |
|
Represents the penalty for penalizing tokens that have already been generated or belong to the context. The value Possible values: Environment variable: |
double |
|
Represents the maximum number of input tokens accepted. This can be used to avoid requests failing due to input being longer than configured limits. If the text is truncated, then it truncates the start of the input (on the left), so the end of the input will remain the same. If this value exceeds the maximum sequence length (refer to the documentation to find this value for the model) then the call will fail if the total number of tokens exceeds the maximum sequence length. Zero means don’t truncate. Possible values: Environment variable: |
int |
|
Pass Environment variable: |
boolean |
|
Whether chat model requests should be logged. Environment variable: |
boolean |
|
Whether chat model responses should be logged. Environment variable: |
boolean |
|
Delimiter used to concatenate the ChatMessage elements into a single string. By setting this property, you can define your preferred way of concatenating messages to ensure that the prompt is structured in the correct way. Environment variable: |
string |
|
Specifies the ID of the model to be used. A list of all available models is provided in the IBM watsonx.ai documentation at the this link. To use a model, locate the Environment variable: |
string |
|
Specifies the maximum number of input tokens accepted. This can be used to prevent requests from failing due to input exceeding the configured token limits. If the input exceeds the specified token limit, the input will be truncated from the end (right side), ensuring that the start of the input remains intact. If the provided value exceeds the model’s maximum sequence length (refer to the documentation for the model’s maximum sequence length), the request will fail if the total number of tokens exceeds the maximum limit. Environment variable: |
int |
|
Whether embedding model requests should be logged. Environment variable: |
boolean |
|
Whether embedding model responses should be logged. Environment variable: |
boolean |
|
The id of the model to be used. All available models are listed in the IBM Watsonx.ai documentation at the link: following link. To use a model, locate the Environment variable: |
string |
|
Specifies the maximum number of input tokens accepted. This helps to avoid requests failing due to input exceeding the configured token limits. If the input exceeds the specified token limit, the text will be truncated from the end (right side), ensuring that the start of the input remains intact. If the provided value exceeds the model’s maximum sequence length (refer to the documentation for the model’s maximum sequence length), the request will fail if the total number of tokens exceeds the maximum limit. Environment variable: |
int |
|
Whether embedding model requests should be logged. Environment variable: |
boolean |
|
Whether embedding model responses should be logged. Environment variable: |
boolean |
|
Base URL for the built-in service. All available URLs are listed in the IBM Watsonx.ai documentation at the following link. Note: If empty, the URL is automatically calculated based on the Environment variable: |
string |
|
IBM Cloud API key. If empty, the api key inherits the value from the Environment variable: |
string |
|
Timeout for built-in tools APIs. If empty, the api key inherits the value from the Environment variable: |
|
|
Whether the built-in rest client should log requests. Environment variable: |
boolean |
|
Whether the built-in rest client should log responses. Environment variable: |
boolean |
|
Maximum number of search results. Possible values: Environment variable: |
int |
|
Type |
Default |
|
Specifies the mode of interaction with the LLM model. This property allows you to choose between two modes of operation:
Allowable values: Environment variable: |
string |
|
Specifies the base URL of the watsonx.ai API. A list of all available URLs is provided in the IBM Watsonx.ai documentation at the this link. Environment variable: |
string |
|
IBM Cloud API key. Environment variable: |
string |
|
Timeout for watsonx.ai calls. Environment variable: |
|
|
The version date for the API of the form YYYY-MM-DD. Environment variable: |
string |
|
The space that contains the resource. Either Environment variable: |
string |
|
The project that contains the resource. Either Environment variable: |
string |
|
Whether the watsonx.ai client should log requests. Environment variable: |
boolean |
|
Whether the watsonx.ai client should log responses. Environment variable: |
boolean |
|
Whether the watsonx.ai client should log requests as cURL commands. Environment variable: |
boolean |
|
Whether to enable the integration. Defaults to Environment variable: |
boolean |
|
Base URL of the IAM Authentication API. Environment variable: |
|
|
Timeout for IAM authentication calls. Environment variable: |
|
|
Grant type for the IAM Authentication API. Environment variable: |
string |
|
Base URL of the Cloud Object Storage API. Environment variable: |
string |
required |
The ID of the connection asset that contains the credentials required to access the data. Environment variable: |
string |
required |
The name of the bucket containing the input document. Environment variable: |
string |
required |
The ID of the connection asset used to store the extracted results. Environment variable: |
string |
required |
The name of the bucket where the output files will be written. Environment variable: |
string |
required |
Whether the Cloud Object Storage client should log requests. Environment variable: |
boolean |
|
Whether the Cloud Object Storage client should log responses. Environment variable: |
boolean |
|
Specifies the model to use for the chat completion. A list of all available models is provided in the IBM watsonx.ai documentation at the this link. To use a model, locate the Environment variable: |
string |
|
Specifies how the model should choose which tool to call during a request. This value can be:
If Setting this value influences the tool-calling behavior of the model when no specific tool is required. Environment variable: |
|
|
Specifies the name of a specific tool that the model must call. When set, the model will be forced to call the specified tool. The name must exactly match one of the available tools defined for the service. Environment variable: |
string |
|
Positive values penalize new tokens based on their existing frequency in the generated text, reducing the likelihood of the model repeating the same lines verbatim. Possible values: Environment variable: |
double |
|
Specifies whether to return the log probabilities of the output tokens. If set to Environment variable: |
boolean |
|
An integer specifying the number of most likely tokens to return at each token position, each with an associated log probability. The option Possible values: Environment variable: |
int |
|
The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model’s context length. Set to 0 for the model’s configured max generated tokens. Environment variable: |
int |
|
Specifies how many chat completion choices to generate for each input message. Environment variable: |
int |
|
Applies a penalty to new tokens based on whether they already appear in the generated text so far, encouraging the model to introduce new topics rather than repeat itself. Possible values: Environment variable: |
double |
|
Random number generator seed to use in sampling mode for experimental repeatability. Environment variable: |
int |
|
Defines one or more stop sequences that will cause the model to stop generating further tokens if any of them are encountered in the output. This allows control over where the model should end its response. If a stop sequence is encountered before the minimum number of tokens has been generated, it will be ignored. Possible values: Environment variable: |
list of string |
|
Specifies the sampling temperature to use in the generation process. Higher values (e.g. Possible values: Environment variable: |
double |
|
An alternative to sampling with Possible values: Environment variable: |
double |
|
Specifies the desired format for the model’s output. Allowable values: Environment variable: |
string |
|
Whether chat model requests should be logged. Environment variable: |
boolean |
|
Whether chat model responses should be logged. Environment variable: |
boolean |
|
The id of the model to be used. All available models are listed in the IBM Watsonx.ai documentation at the link: following link. To use a model, locate the Environment variable: |
string |
|
Represents the strategy used for picking the tokens during generation of the output text. During text generation when parameter value is set to Allowable values: Environment variable: |
string |
|
Represents the factor of exponential decay. Larger values correspond to more aggressive decay. Possible values: Environment variable: |
double |
|
A number of generated tokens after which this should take effect. Possible values: Environment variable: |
int |
|
The maximum number of new tokens to be generated. The maximum supported value for this field depends on the model being used. How the "token" is defined depends on the tokenizer and vocabulary size, which in turn depends on the model. Often the tokens are a mix of full words and sub-words. Depending on the users plan, and on the model being used, there may be an enforced maximum number of new tokens. Possible values: Environment variable: |
int |
|
If stop sequences are given, they are ignored until minimum tokens are generated. Possible values: Environment variable: |
int |
|
Random number generator seed to use in sampling mode for experimental repeatability. Possible values: Environment variable: |
int |
|
Stop sequences are one or more strings which will cause the text generation to stop if/when they are produced as part of the output. Stop sequences encountered prior to the minimum number of tokens being generated will be ignored. Possible values: Environment variable: |
list of string |
|
A value used to modify the next-token probabilities in Possible values: Environment variable: |
double |
|
The number of highest probability vocabulary tokens to keep for top-k-filtering. Only applies for Possible values: Environment variable: |
int |
|
Similar to Possible values: Environment variable: |
double |
|
Represents the penalty for penalizing tokens that have already been generated or belong to the context. The value Possible values: Environment variable: |
double |
|
Represents the maximum number of input tokens accepted. This can be used to avoid requests failing due to input being longer than configured limits. If the text is truncated, then it truncates the start of the input (on the left), so the end of the input will remain the same. If this value exceeds the maximum sequence length (refer to the documentation to find this value for the model) then the call will fail if the total number of tokens exceeds the maximum sequence length. Zero means don’t truncate. Possible values: Environment variable: |
int |
|
Pass Environment variable: |
boolean |
|
Whether chat model requests should be logged. Environment variable: |
boolean |
|
Whether chat model responses should be logged. Environment variable: |
boolean |
|
Delimiter used to concatenate the ChatMessage elements into a single string. By setting this property, you can define your preferred way of concatenating messages to ensure that the prompt is structured in the correct way. Environment variable: |
string |
|
Specifies the ID of the model to be used. A list of all available models is provided in the IBM watsonx.ai documentation at the this link. To use a model, locate the Environment variable: |
string |
|
Specifies the maximum number of input tokens accepted. This can be used to prevent requests from failing due to input exceeding the configured token limits. If the input exceeds the specified token limit, the input will be truncated from the end (right side), ensuring that the start of the input remains intact. If the provided value exceeds the model’s maximum sequence length (refer to the documentation for the model’s maximum sequence length), the request will fail if the total number of tokens exceeds the maximum limit. Environment variable: |
int |
|
Whether embedding model requests should be logged. Environment variable: |
boolean |
|
Whether embedding model responses should be logged. Environment variable: |
boolean |
|
The id of the model to be used. All available models are listed in the IBM Watsonx.ai documentation at the link: following link. To use a model, locate the Environment variable: |
string |
|
Specifies the maximum number of input tokens accepted. This helps to avoid requests failing due to input exceeding the configured token limits. If the input exceeds the specified token limit, the text will be truncated from the end (right side), ensuring that the start of the input remains intact. If the provided value exceeds the model’s maximum sequence length (refer to the documentation for the model’s maximum sequence length), the request will fail if the total number of tokens exceeds the maximum limit. Environment variable: |
int |
|
Whether embedding model requests should be logged. Environment variable: |
boolean |
|
Whether embedding model responses should be logged. Environment variable: |
boolean |
|
|
About the Duration format
To write duration values, use the standard You can also use a simplified format, starting with a number:
In other cases, the simplified format is translated to the
|