IBM BAM

IBM Research Big AI Model (BAM) is built by IBM Research as a test bed and incubator for helping accelerate generative AI research and its transition into IBM products.

Using BAM

To employ BAM LLMs, integrate the following dependency into your project:

<dependency>
    <groupId>io.quarkiverse.langchain4j</groupId>
    <artifactId>quarkus-langchain4j-bam</artifactId>
    <version>0.17.0.CR1</version>
</dependency>

If no other extension is installed, AI Services will automatically utilize the configured BAM dependency.

Configuration

Configuring BAM models requires an API key, which can be obtained from this page.

The api key can be set in the application.properties file:

quarkus.langchain4j.bam.api-key=pak-...

All configuration properties

Configuration property fixed at build time - All other configuration properties are overridable at runtime

Configuration property	Type	Default
`quarkus.langchain4j.bam.chat-model.enabled` Whether the chat model should be enabled Environment variable: `QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_ENABLED`	boolean	`true`
`quarkus.langchain4j.bam.embedding-model.enabled` Whether the embedding model should be enabled Environment variable: `QUARKUS_LANGCHAIN4J_BAM_EMBEDDING_MODEL_ENABLED`	boolean	`true`
`quarkus.langchain4j.bam.moderation-model.enabled` Whether the model should be enabled Environment variable: `QUARKUS_LANGCHAIN4J_BAM_MODERATION_MODEL_ENABLED`	boolean	`true`
`quarkus.langchain4j.bam.base-url` Base URL where the BAM serving is running Environment variable: `QUARKUS_LANGCHAIN4J_BAM_BASE_URL`	URL	`https://bam-api.res.ibm.com`
`quarkus.langchain4j.bam.api-key` BAM API key Environment variable: `QUARKUS_LANGCHAIN4J_BAM_API_KEY`	string	`dummy`
`quarkus.langchain4j.bam.timeout` Timeout for BAM calls Environment variable: `QUARKUS_LANGCHAIN4J_BAM_TIMEOUT`	Duration	`10s`
`quarkus.langchain4j.bam.version` Version to use Environment variable: `QUARKUS_LANGCHAIN4J_BAM_VERSION`	string	`2024-04-15`
`quarkus.langchain4j.bam.log-requests` Whether the BAM client should log requests Environment variable: `QUARKUS_LANGCHAIN4J_BAM_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.bam.log-responses` Whether the BAM client should log responses Environment variable: `QUARKUS_LANGCHAIN4J_BAM_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.bam.enable-integration` Whether to enable the integration. Defaults to `true`, which means requests are made to the BAM provider. Set to `false` to disable all requests. Environment variable: `QUARKUS_LANGCHAIN4J_BAM_ENABLE_INTEGRATION`	boolean	`true`
`quarkus.langchain4j.bam.chat-model.model-id` Model to use Environment variable: `QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_MODEL_ID`	string	`ibm/granite-13b-chat-v2`
`quarkus.langchain4j.bam.chat-model.decoding-method` Represents the strategy used for picking the tokens during generation of the output text. Options are greedy and sample. Value defaults to sample if not specified. During text generation when parameter value is set to greedy, each successive token corresponds to the highest probability token given the text that has already been generated. This strategy can lead to repetitive results especially for longer output sequences. The alternative sample strategy generates text by picking subsequent tokens based on the probability distribution of possible next tokens defined by (i.e., conditioned on) the already-generated text and the top_k and top_p parameters described below. Environment variable: `QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_DECODING_METHOD`	string	`greedy`
`quarkus.langchain4j.bam.chat-model.include-stop-sequence` Pass false to omit matched stop sequences from the end of the output text. The default is currently true meaning that the output will end with the stop sequence text when matched. Environment variable: `QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_INCLUDE_STOP_SEQUENCE`	boolean
`quarkus.langchain4j.bam.chat-model.temperature` A value used to modify the next-token probabilities in sampling mode. Values less than 1.0 sharpen the probability distribution, resulting in "less random" output. Values greater than 1.0 flatten the probability distribution, resulting in "more random" output. A value of 1.0 has no effect and is the default. The allowed range is 0.0 to 2.0. Environment variable: `QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_TEMPERATURE`	double	`1.0`
`quarkus.langchain4j.bam.chat-model.min-new-tokens` If stop sequences are given, they are ignored until minimum tokens are generated. Defaults to 0. Environment variable: `QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_MIN_NEW_TOKENS`	int	`0`
`quarkus.langchain4j.bam.chat-model.max-new-tokens` The maximum number of new tokens to be generated. The range is 0 to 1024. Environment variable: `QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_MAX_NEW_TOKENS`	int	`200`
`quarkus.langchain4j.bam.chat-model.random-seed` Random number generator seed to use in sampling mode for experimental repeatability. Must be >= 1. Environment variable: `QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_RANDOM_SEED`	int
`quarkus.langchain4j.bam.chat-model.stop-sequences` Stop sequences are one or more strings which will cause the text generation to stop if/when they are produced as part of the output. Stop sequences encountered prior to the minimum number of tokens being generated will be ignored. The list may contain up to 6 strings. Environment variable: `QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_STOP_SEQUENCES`	list of string
`quarkus.langchain4j.bam.chat-model.time-limit` Time limit in milliseconds - if not completed within this time, generation will stop. The text generated so far will be returned along with the time_limit stop reason. Environment variable: `QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_TIME_LIMIT`	int
`quarkus.langchain4j.bam.chat-model.top-k` The number of highest probability vocabulary tokens to keep for top-k-filtering. Only applies for sampling mode, with range from 1 to 100. When decoding_strategy is set to sample, only the top_k most likely tokens are considered as candidates for the next generated token. Environment variable: `QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_TOP_K`	int
`quarkus.langchain4j.bam.chat-model.top-p` Similar to top_k except the candidates to generate the next token are the most likely tokens with probabilities that add up to at least top_p. The valid range is 0.0 to 1.0 where 1.0 is equivalent to disabled and is the default. Also known as nucleus sampling. Environment variable: `QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_TOP_P`	double
`quarkus.langchain4j.bam.chat-model.typical-p` Local typicality measures how similar the conditional probability of predicting a target token next is to the expected conditional probability of predicting a random token next, given the partial text already generated. If set to float < 1, the smallest set of the most locally typical tokens with probabilities that add up to typical_p or higher are kept for generation. Environment variable: `QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_TYPICAL_P`	double
`quarkus.langchain4j.bam.chat-model.repetition-penalty` Represents the penalty for penalizing tokens that have already been generated or belong to the context. The range is 1.0 to 2.0 and defaults to 1.0 (no penalty). Environment variable: `QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_REPETITION_PENALTY`	double
`quarkus.langchain4j.bam.chat-model.truncate-input-tokens` Represents the number to which input tokens would be truncated. Can be used to avoid requests failing due to input being longer than configured limits. Zero means don’t truncate. Environment variable: `QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_TRUNCATE_INPUT_TOKENS`	int
`quarkus.langchain4j.bam.chat-model.beam-width` Multiple output sequences of tokens are generated, using your decoding selection, and then the output sequence with the highest overall probability is returned. When beam search is enabled, there will be a performance penalty, and Stop sequences will not be available. Environment variable: `QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_BEAM_WIDTH`	int
`quarkus.langchain4j.bam.chat-model.log-requests` Whether the BAM chat model should log requests Environment variable: `QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.bam.chat-model.log-responses` Whether the BAM chat model should log requests Environment variable: `QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.bam.embedding-model.model-id` Model to use Environment variable: `QUARKUS_LANGCHAIN4J_BAM_EMBEDDING_MODEL_MODEL_ID`	string	`ibm/slate.125m.english.rtrvr`
`quarkus.langchain4j.bam.embedding-model.log-requests` Whether the BAM embedding model should log requests Environment variable: `QUARKUS_LANGCHAIN4J_BAM_EMBEDDING_MODEL_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.bam.embedding-model.log-responses` Whether the BAM embedding model should log requests Environment variable: `QUARKUS_LANGCHAIN4J_BAM_EMBEDDING_MODEL_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.bam.moderation-model.messages-to-moderate` What types of messages are subject to moderation checks. Environment variable: `QUARKUS_LANGCHAIN4J_BAM_MODERATION_MODEL_MESSAGES_TO_MODERATE`	list of `system`, `user`, `ai`, `tool-execution-result`	`user`
`quarkus.langchain4j.bam.moderation-model.hap` The HAP detector is intended to identify hateful, abusive, and/or profane language. The float is a value from 0.1 to 1 that allows you to control when a content must be flagged by the detector. Environment variable: `QUARKUS_LANGCHAIN4J_BAM_MODERATION_MODEL_HAP`	float
`quarkus.langchain4j.bam.moderation-model.social-bias` The social bias detector is intended to identify subtle forms of hate speech and discriminatory content which may easily go undetected by keyword detection systems or HAP classifiers. The float is a value from 0.1 to 1 that allows you to control when a content must be flagged by the detector. Environment variable: `QUARKUS_LANGCHAIN4J_BAM_MODERATION_MODEL_SOCIAL_BIAS`	float
`quarkus.langchain4j.bam.moderation-model.log-requests` Whether the BAM moderation model should log requests Environment variable: `QUARKUS_LANGCHAIN4J_BAM_MODERATION_MODEL_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.bam.moderation-model.log-responses` Whether the BAM moderation model should log requests Environment variable: `QUARKUS_LANGCHAIN4J_BAM_MODERATION_MODEL_LOG_RESPONSES`	boolean	`false`
Named model config	Type	Default
`quarkus.langchain4j.bam."model-name".base-url` Base URL where the BAM serving is running Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__BASE_URL`	URL	`https://bam-api.res.ibm.com`
`quarkus.langchain4j.bam."model-name".api-key` BAM API key Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__API_KEY`	string	`dummy`
`quarkus.langchain4j.bam."model-name".timeout` Timeout for BAM calls Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__TIMEOUT`	Duration	`10s`
`quarkus.langchain4j.bam."model-name".version` Version to use Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__VERSION`	string	`2024-04-15`
`quarkus.langchain4j.bam."model-name".log-requests` Whether the BAM client should log requests Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.bam."model-name".log-responses` Whether the BAM client should log responses Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.bam."model-name".enable-integration` Whether to enable the integration. Defaults to `true`, which means requests are made to the BAM provider. Set to `false` to disable all requests. Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__ENABLE_INTEGRATION`	boolean	`true`
`quarkus.langchain4j.bam."model-name".chat-model.model-id` Model to use Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_MODEL_ID`	string	`ibm/granite-13b-chat-v2`
`quarkus.langchain4j.bam."model-name".chat-model.decoding-method` Represents the strategy used for picking the tokens during generation of the output text. Options are greedy and sample. Value defaults to sample if not specified. During text generation when parameter value is set to greedy, each successive token corresponds to the highest probability token given the text that has already been generated. This strategy can lead to repetitive results especially for longer output sequences. The alternative sample strategy generates text by picking subsequent tokens based on the probability distribution of possible next tokens defined by (i.e., conditioned on) the already-generated text and the top_k and top_p parameters described below. Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_DECODING_METHOD`	string	`greedy`
`quarkus.langchain4j.bam."model-name".chat-model.include-stop-sequence` Pass false to omit matched stop sequences from the end of the output text. The default is currently true meaning that the output will end with the stop sequence text when matched. Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_INCLUDE_STOP_SEQUENCE`	boolean
`quarkus.langchain4j.bam."model-name".chat-model.temperature` A value used to modify the next-token probabilities in sampling mode. Values less than 1.0 sharpen the probability distribution, resulting in "less random" output. Values greater than 1.0 flatten the probability distribution, resulting in "more random" output. A value of 1.0 has no effect and is the default. The allowed range is 0.0 to 2.0. Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_TEMPERATURE`	double	`1.0`
`quarkus.langchain4j.bam."model-name".chat-model.min-new-tokens` If stop sequences are given, they are ignored until minimum tokens are generated. Defaults to 0. Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_MIN_NEW_TOKENS`	int	`0`
`quarkus.langchain4j.bam."model-name".chat-model.max-new-tokens` The maximum number of new tokens to be generated. The range is 0 to 1024. Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_MAX_NEW_TOKENS`	int	`200`
`quarkus.langchain4j.bam."model-name".chat-model.random-seed` Random number generator seed to use in sampling mode for experimental repeatability. Must be >= 1. Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_RANDOM_SEED`	int
`quarkus.langchain4j.bam."model-name".chat-model.stop-sequences` Stop sequences are one or more strings which will cause the text generation to stop if/when they are produced as part of the output. Stop sequences encountered prior to the minimum number of tokens being generated will be ignored. The list may contain up to 6 strings. Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_STOP_SEQUENCES`	list of string
`quarkus.langchain4j.bam."model-name".chat-model.time-limit` Time limit in milliseconds - if not completed within this time, generation will stop. The text generated so far will be returned along with the time_limit stop reason. Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_TIME_LIMIT`	int
`quarkus.langchain4j.bam."model-name".chat-model.top-k` The number of highest probability vocabulary tokens to keep for top-k-filtering. Only applies for sampling mode, with range from 1 to 100. When decoding_strategy is set to sample, only the top_k most likely tokens are considered as candidates for the next generated token. Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_TOP_K`	int
`quarkus.langchain4j.bam."model-name".chat-model.top-p` Similar to top_k except the candidates to generate the next token are the most likely tokens with probabilities that add up to at least top_p. The valid range is 0.0 to 1.0 where 1.0 is equivalent to disabled and is the default. Also known as nucleus sampling. Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_TOP_P`	double
`quarkus.langchain4j.bam."model-name".chat-model.typical-p` Local typicality measures how similar the conditional probability of predicting a target token next is to the expected conditional probability of predicting a random token next, given the partial text already generated. If set to float < 1, the smallest set of the most locally typical tokens with probabilities that add up to typical_p or higher are kept for generation. Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_TYPICAL_P`	double
`quarkus.langchain4j.bam."model-name".chat-model.repetition-penalty` Represents the penalty for penalizing tokens that have already been generated or belong to the context. The range is 1.0 to 2.0 and defaults to 1.0 (no penalty). Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_REPETITION_PENALTY`	double
`quarkus.langchain4j.bam."model-name".chat-model.truncate-input-tokens` Represents the number to which input tokens would be truncated. Can be used to avoid requests failing due to input being longer than configured limits. Zero means don’t truncate. Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_TRUNCATE_INPUT_TOKENS`	int
`quarkus.langchain4j.bam."model-name".chat-model.beam-width` Multiple output sequences of tokens are generated, using your decoding selection, and then the output sequence with the highest overall probability is returned. When beam search is enabled, there will be a performance penalty, and Stop sequences will not be available. Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_BEAM_WIDTH`	int
`quarkus.langchain4j.bam."model-name".chat-model.log-requests` Whether the BAM chat model should log requests Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.bam."model-name".chat-model.log-responses` Whether the BAM chat model should log requests Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.bam."model-name".embedding-model.model-id` Model to use Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__EMBEDDING_MODEL_MODEL_ID`	string	`ibm/slate.125m.english.rtrvr`
`quarkus.langchain4j.bam."model-name".embedding-model.log-requests` Whether the BAM embedding model should log requests Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__EMBEDDING_MODEL_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.bam."model-name".embedding-model.log-responses` Whether the BAM embedding model should log requests Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__EMBEDDING_MODEL_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.bam."model-name".moderation-model.messages-to-moderate` What types of messages are subject to moderation checks. Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__MODERATION_MODEL_MESSAGES_TO_MODERATE`	list of `system`, `user`, `ai`, `tool-execution-result`	`user`
`quarkus.langchain4j.bam."model-name".moderation-model.hap` The HAP detector is intended to identify hateful, abusive, and/or profane language. The float is a value from 0.1 to 1 that allows you to control when a content must be flagged by the detector. Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__MODERATION_MODEL_HAP`	float
`quarkus.langchain4j.bam."model-name".moderation-model.social-bias` The social bias detector is intended to identify subtle forms of hate speech and discriminatory content which may easily go undetected by keyword detection systems or HAP classifiers. The float is a value from 0.1 to 1 that allows you to control when a content must be flagged by the detector. Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__MODERATION_MODEL_SOCIAL_BIAS`	float
`quarkus.langchain4j.bam."model-name".moderation-model.log-requests` Whether the BAM moderation model should log requests Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__MODERATION_MODEL_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.bam."model-name".moderation-model.log-responses` Whether the BAM moderation model should log requests Environment variable: `QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__MODERATION_MODEL_LOG_RESPONSES`	boolean	`false`

Configuration property

Type

Default

quarkus.langchain4j.bam.chat-model.enabled

Whether the chat model should be enabled

Environment variable: QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_ENABLED

boolean

true

quarkus.langchain4j.bam.embedding-model.enabled

Whether the embedding model should be enabled

Environment variable: QUARKUS_LANGCHAIN4J_BAM_EMBEDDING_MODEL_ENABLED

boolean

true

quarkus.langchain4j.bam.moderation-model.enabled

Whether the model should be enabled

Environment variable: QUARKUS_LANGCHAIN4J_BAM_MODERATION_MODEL_ENABLED

boolean

true

quarkus.langchain4j.bam.base-url

Base URL where the BAM serving is running

Environment variable: QUARKUS_LANGCHAIN4J_BAM_BASE_URL

URL

https://bam-api.res.ibm.com

quarkus.langchain4j.bam.api-key

BAM API key

Environment variable: QUARKUS_LANGCHAIN4J_BAM_API_KEY

string

dummy

quarkus.langchain4j.bam.timeout

Timeout for BAM calls

Environment variable: QUARKUS_LANGCHAIN4J_BAM_TIMEOUT

Duration

10s

quarkus.langchain4j.bam.version

Version to use

Environment variable: QUARKUS_LANGCHAIN4J_BAM_VERSION

string

2024-04-15

quarkus.langchain4j.bam.log-requests

Whether the BAM client should log requests

Environment variable: QUARKUS_LANGCHAIN4J_BAM_LOG_REQUESTS

boolean

false

quarkus.langchain4j.bam.log-responses

Whether the BAM client should log responses

Environment variable: QUARKUS_LANGCHAIN4J_BAM_LOG_RESPONSES

boolean

false

quarkus.langchain4j.bam.enable-integration

Whether to enable the integration. Defaults to true, which means requests are made to the BAM provider. Set to false to disable all requests.

Environment variable: QUARKUS_LANGCHAIN4J_BAM_ENABLE_INTEGRATION

boolean

true

quarkus.langchain4j.bam.chat-model.model-id

Model to use

Environment variable: QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_MODEL_ID

string

ibm/granite-13b-chat-v2

quarkus.langchain4j.bam.chat-model.decoding-method

Represents the strategy used for picking the tokens during generation of the output text. Options are greedy and sample. Value defaults to sample if not specified.

During text generation when parameter value is set to greedy, each successive token corresponds to the highest probability token given the text that has already been generated. This strategy can lead to repetitive results especially for longer output sequences. The alternative sample strategy generates text by picking subsequent tokens based on the probability distribution of possible next tokens defined by (i.e., conditioned on) the already-generated text and the top_k and top_p parameters described below.

Environment variable: QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_DECODING_METHOD

string

greedy

quarkus.langchain4j.bam.chat-model.include-stop-sequence

Pass false to omit matched stop sequences from the end of the output text. The default is currently true meaning that the output will end with the stop sequence text when matched.

Environment variable: QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_INCLUDE_STOP_SEQUENCE

boolean

quarkus.langchain4j.bam.chat-model.temperature

A value used to modify the next-token probabilities in sampling mode. Values less than 1.0 sharpen the probability distribution, resulting in "less random" output. Values greater than 1.0 flatten the probability distribution, resulting in "more random" output. A value of 1.0 has no effect and is the default. The allowed range is 0.0 to 2.0.

Environment variable: QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_TEMPERATURE

double

1.0

quarkus.langchain4j.bam.chat-model.min-new-tokens

If stop sequences are given, they are ignored until minimum tokens are generated. Defaults to 0.

Environment variable: QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_MIN_NEW_TOKENS

int

0

quarkus.langchain4j.bam.chat-model.max-new-tokens

The maximum number of new tokens to be generated. The range is 0 to 1024.

Environment variable: QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_MAX_NEW_TOKENS

int

200

quarkus.langchain4j.bam.chat-model.random-seed

Random number generator seed to use in sampling mode for experimental repeatability. Must be >= 1.

Environment variable: QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_RANDOM_SEED

int

quarkus.langchain4j.bam.chat-model.stop-sequences

Stop sequences are one or more strings which will cause the text generation to stop if/when they are produced as part of the output. Stop sequences encountered prior to the minimum number of tokens being generated will be ignored. The list may contain up to 6 strings.

Environment variable: QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_STOP_SEQUENCES

list of string

quarkus.langchain4j.bam.chat-model.time-limit

Time limit in milliseconds - if not completed within this time, generation will stop. The text generated so far will be returned along with the time_limit stop reason.

Environment variable: QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_TIME_LIMIT

int

quarkus.langchain4j.bam.chat-model.top-k

The number of highest probability vocabulary tokens to keep for top-k-filtering. Only applies for sampling mode, with range from 1 to 100. When decoding_strategy is set to sample, only the top_k most likely tokens are considered as candidates for the next generated token.

Environment variable: QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_TOP_K

int

quarkus.langchain4j.bam.chat-model.top-p

Similar to top_k except the candidates to generate the next token are the most likely tokens with probabilities that add up to at least top_p. The valid range is 0.0 to 1.0 where 1.0 is equivalent to disabled and is the default. Also known as nucleus sampling.

Environment variable: QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_TOP_P

double

quarkus.langchain4j.bam.chat-model.typical-p

Local typicality measures how similar the conditional probability of predicting a target token next is to the expected conditional probability of predicting a random token next, given the partial text already generated. If set to float < 1, the smallest set of the most locally typical tokens with probabilities that add up to typical_p or higher are kept for generation.

Environment variable: QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_TYPICAL_P

double

quarkus.langchain4j.bam.chat-model.repetition-penalty

Represents the penalty for penalizing tokens that have already been generated or belong to the context. The range is 1.0 to 2.0 and defaults to 1.0 (no penalty).

Environment variable: QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_REPETITION_PENALTY

double

quarkus.langchain4j.bam.chat-model.truncate-input-tokens

Represents the number to which input tokens would be truncated. Can be used to avoid requests failing due to input being longer than configured limits. Zero means don’t truncate.

Environment variable: QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_TRUNCATE_INPUT_TOKENS

int

quarkus.langchain4j.bam.chat-model.beam-width

Multiple output sequences of tokens are generated, using your decoding selection, and then the output sequence with the highest overall probability is returned. When beam search is enabled, there will be a performance penalty, and Stop sequences will not be available.

Environment variable: QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_BEAM_WIDTH

int

quarkus.langchain4j.bam.chat-model.log-requests

Whether the BAM chat model should log requests

Environment variable: QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_LOG_REQUESTS

boolean

false

quarkus.langchain4j.bam.chat-model.log-responses

Whether the BAM chat model should log requests

Environment variable: QUARKUS_LANGCHAIN4J_BAM_CHAT_MODEL_LOG_RESPONSES

boolean

false

quarkus.langchain4j.bam.embedding-model.model-id

Model to use

Environment variable: QUARKUS_LANGCHAIN4J_BAM_EMBEDDING_MODEL_MODEL_ID

string

ibm/slate.125m.english.rtrvr

quarkus.langchain4j.bam.embedding-model.log-requests

Whether the BAM embedding model should log requests

Environment variable: QUARKUS_LANGCHAIN4J_BAM_EMBEDDING_MODEL_LOG_REQUESTS

boolean

false

quarkus.langchain4j.bam.embedding-model.log-responses

Whether the BAM embedding model should log requests

Environment variable: QUARKUS_LANGCHAIN4J_BAM_EMBEDDING_MODEL_LOG_RESPONSES

boolean

false

quarkus.langchain4j.bam.moderation-model.messages-to-moderate

What types of messages are subject to moderation checks.

Environment variable: QUARKUS_LANGCHAIN4J_BAM_MODERATION_MODEL_MESSAGES_TO_MODERATE

list of system, user, ai, tool-execution-result

user

quarkus.langchain4j.bam.moderation-model.hap

The HAP detector is intended to identify hateful, abusive, and/or profane language.

The float is a value from 0.1 to 1 that allows you to control when a content must be flagged by the detector.

Environment variable: QUARKUS_LANGCHAIN4J_BAM_MODERATION_MODEL_HAP

float

quarkus.langchain4j.bam.moderation-model.social-bias

The social bias detector is intended to identify subtle forms of hate speech and discriminatory content which may easily go undetected by keyword detection systems or HAP classifiers.

The float is a value from 0.1 to 1 that allows you to control when a content must be flagged by the detector.

Environment variable: QUARKUS_LANGCHAIN4J_BAM_MODERATION_MODEL_SOCIAL_BIAS

float

quarkus.langchain4j.bam.moderation-model.log-requests

Whether the BAM moderation model should log requests

Environment variable: QUARKUS_LANGCHAIN4J_BAM_MODERATION_MODEL_LOG_REQUESTS

boolean

false

quarkus.langchain4j.bam.moderation-model.log-responses

Whether the BAM moderation model should log requests

Environment variable: QUARKUS_LANGCHAIN4J_BAM_MODERATION_MODEL_LOG_RESPONSES

boolean

false

Named model config

Type

Default

quarkus.langchain4j.bam."model-name".base-url

Base URL where the BAM serving is running

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__BASE_URL

URL

https://bam-api.res.ibm.com

quarkus.langchain4j.bam."model-name".api-key

BAM API key

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__API_KEY

string

dummy

quarkus.langchain4j.bam."model-name".timeout

Timeout for BAM calls

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__TIMEOUT

Duration

10s

quarkus.langchain4j.bam."model-name".version

Version to use

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__VERSION

string

2024-04-15

quarkus.langchain4j.bam."model-name".log-requests

Whether the BAM client should log requests

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__LOG_REQUESTS

boolean

false

quarkus.langchain4j.bam."model-name".log-responses

Whether the BAM client should log responses

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__LOG_RESPONSES

boolean

false

quarkus.langchain4j.bam."model-name".enable-integration

Whether to enable the integration. Defaults to true, which means requests are made to the BAM provider. Set to false to disable all requests.

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__ENABLE_INTEGRATION

boolean

true

quarkus.langchain4j.bam."model-name".chat-model.model-id

Model to use

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_MODEL_ID

string

ibm/granite-13b-chat-v2

quarkus.langchain4j.bam."model-name".chat-model.decoding-method

Represents the strategy used for picking the tokens during generation of the output text. Options are greedy and sample. Value defaults to sample if not specified.

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_DECODING_METHOD

string

greedy

quarkus.langchain4j.bam."model-name".chat-model.include-stop-sequence

Pass false to omit matched stop sequences from the end of the output text. The default is currently true meaning that the output will end with the stop sequence text when matched.

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_INCLUDE_STOP_SEQUENCE

boolean

quarkus.langchain4j.bam."model-name".chat-model.temperature

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_TEMPERATURE

double

1.0

quarkus.langchain4j.bam."model-name".chat-model.min-new-tokens

If stop sequences are given, they are ignored until minimum tokens are generated. Defaults to 0.

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_MIN_NEW_TOKENS

int

0

quarkus.langchain4j.bam."model-name".chat-model.max-new-tokens

The maximum number of new tokens to be generated. The range is 0 to 1024.

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_MAX_NEW_TOKENS

int

200

quarkus.langchain4j.bam."model-name".chat-model.random-seed

Random number generator seed to use in sampling mode for experimental repeatability. Must be >= 1.

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_RANDOM_SEED

int

quarkus.langchain4j.bam."model-name".chat-model.stop-sequences

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_STOP_SEQUENCES

list of string

quarkus.langchain4j.bam."model-name".chat-model.time-limit

Time limit in milliseconds - if not completed within this time, generation will stop. The text generated so far will be returned along with the time_limit stop reason.

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_TIME_LIMIT

int

quarkus.langchain4j.bam."model-name".chat-model.top-k

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_TOP_K

int

quarkus.langchain4j.bam."model-name".chat-model.top-p

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_TOP_P

double

quarkus.langchain4j.bam."model-name".chat-model.typical-p

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_TYPICAL_P

double

quarkus.langchain4j.bam."model-name".chat-model.repetition-penalty

Represents the penalty for penalizing tokens that have already been generated or belong to the context. The range is 1.0 to 2.0 and defaults to 1.0 (no penalty).

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_REPETITION_PENALTY

double

quarkus.langchain4j.bam."model-name".chat-model.truncate-input-tokens

Represents the number to which input tokens would be truncated. Can be used to avoid requests failing due to input being longer than configured limits. Zero means don’t truncate.

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_TRUNCATE_INPUT_TOKENS

int

quarkus.langchain4j.bam."model-name".chat-model.beam-width

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_BEAM_WIDTH

int

quarkus.langchain4j.bam."model-name".chat-model.log-requests

Whether the BAM chat model should log requests

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_LOG_REQUESTS

boolean

false

quarkus.langchain4j.bam."model-name".chat-model.log-responses

Whether the BAM chat model should log requests

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__CHAT_MODEL_LOG_RESPONSES

boolean

false

quarkus.langchain4j.bam."model-name".embedding-model.model-id

Model to use

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__EMBEDDING_MODEL_MODEL_ID

string

ibm/slate.125m.english.rtrvr

quarkus.langchain4j.bam."model-name".embedding-model.log-requests

Whether the BAM embedding model should log requests

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__EMBEDDING_MODEL_LOG_REQUESTS

boolean

false

quarkus.langchain4j.bam."model-name".embedding-model.log-responses

Whether the BAM embedding model should log requests

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__EMBEDDING_MODEL_LOG_RESPONSES

boolean

false

quarkus.langchain4j.bam."model-name".moderation-model.messages-to-moderate

What types of messages are subject to moderation checks.

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__MODERATION_MODEL_MESSAGES_TO_MODERATE

list of system, user, ai, tool-execution-result

user

quarkus.langchain4j.bam."model-name".moderation-model.hap

The HAP detector is intended to identify hateful, abusive, and/or profane language.

The float is a value from 0.1 to 1 that allows you to control when a content must be flagged by the detector.

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__MODERATION_MODEL_HAP

float

quarkus.langchain4j.bam."model-name".moderation-model.social-bias

The social bias detector is intended to identify subtle forms of hate speech and discriminatory content which may easily go undetected by keyword detection systems or HAP classifiers.

The float is a value from 0.1 to 1 that allows you to control when a content must be flagged by the detector.

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__MODERATION_MODEL_SOCIAL_BIAS

float

quarkus.langchain4j.bam."model-name".moderation-model.log-requests

Whether the BAM moderation model should log requests

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__MODERATION_MODEL_LOG_REQUESTS

boolean

false

quarkus.langchain4j.bam."model-name".moderation-model.log-responses

Whether the BAM moderation model should log requests

Environment variable: QUARKUS_LANGCHAIN4J_BAM__MODEL_NAME__MODERATION_MODEL_LOG_RESPONSES

boolean

false

About the Duration format

To write duration values, use the standard java.time.Duration format. See the Duration#parse() Java API documentation for more information.

You can also use a simplified format, starting with a number:

If the value is only a number, it represents time in seconds.
If the value is a number followed by ms, it represents time in milliseconds.

In other cases, the simplified format is translated to the java.time.Duration format for parsing:

If the value is a number followed by h, m, or s, it is prefixed with PT.
If the value is a number followed by d, it is prefixed with P.

Example

An example usage is the following:

quarkus.langchain4j.bam.api-key=pak-...
quarkus.langchain4j.bam.chat-model.model-id=ibm/granite-13b-chat-v2

public record Result(Integer result) {}

@RegisterAiService
public interface LLMService {

    @SystemMessage("You are a calculator")
    @UserMessage("""
        You must perform the mathematical operation delimited by ---
        ---
        {firstNumber} + {secondNumber}
        ---
    """)
    public Result calculator(int firstNumber, int secondNumber);
}

@Path("/llm")
public class LLMResource {

    @Inject
    LLMService llmService;

    @GET
    @Path("/calculator")
    public Result calculator() {
        return llmService.calculator(2, 2);
    }
}

❯ curl http://localhost:8080/llm/calculator
{"result":4}

Sometimes it may be useful to use the quarkus.langchain4j.bam.chat-model.stop-sequences property to prevent the LLM model from returning more results than desired.