Gemini Chat Model

Gemini is a simpler platform designed for a broader audience than Vertex AI Gemini, including non-technical users. It is a good first step for developers to get started with Gemini models.

Using Gemini Chat Models

To use Gemini chat models, add the following dependency to your project:

<dependency>
    <groupId>io.quarkiverse.langchain4j</groupId>
    <artifactId>quarkus-langchain4j-ai-gemini</artifactId>
    <version>1.5.0.CR2</version>
</dependency>

If no other LLM extension is installed, AI Services will automatically utilize the configured Gemini model.

Configuration

Gemini requires an API key, which can be generated from the Gemini platform.

Set the key in your application.properties:

quarkus.langchain4j.ai.gemini.api-key=...

Alternatively, you can set the QUARKUS_LANGCHAIN4J_AI_GEMINI_API_KEY environment variable.

Several configuration properties are available:

Configuration property fixed at build time - All other configuration properties are overridable at runtime

Configuration property	Type	Default
`quarkus.langchain4j.ai.gemini.chat-model.enabled` Whether the model should be enabled Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_ENABLED`	boolean	`true`
`quarkus.langchain4j.ai.gemini.embedding-model.enabled` Whether the model should be enabled Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI_EMBEDDING_MODEL_ENABLED`	boolean	`true`
`quarkus.langchain4j.ai.gemini.api-key` The api key Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI_API_KEY`	string
`quarkus.langchain4j.ai.gemini.publisher` Publisher of model Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI_PUBLISHER`	string	`google`
`quarkus.langchain4j.ai.gemini.base-url` Meant to be used for testing only in order to override the base URL used by the client Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI_BASE_URL`	string
`quarkus.langchain4j.ai.gemini.enable-integration` Whether to enable the integration. Defaults to `true`, which means requests are made to the Vertex AI Gemini provider. Set to `false` to disable all requests. Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI_ENABLE_INTEGRATION`	boolean	`true`
`quarkus.langchain4j.ai.gemini.log-requests` Whether the Vertex AI client should log requests Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.ai.gemini.log-responses` Whether the Vertex AI client should log responses Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.ai.gemini.timeout` Timeout for requests to gemini APIs Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI_TIMEOUT`	Duration	`${QUARKUS.LANGCHAIN4J.TIMEOUT}`
`quarkus.langchain4j.ai.gemini.chat-model.model-id` The id of the model to use. Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_MODEL_ID`	string	`gemini-2.5-flash`
`quarkus.langchain4j.ai.gemini.chat-model.temperature` The temperature is used for sampling during response generation, which occurs when topP and topK are applied. Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a less open-ended or creative response, while higher temperatures can lead to more diverse or creative results. A temperature of 0 means that the highest probability tokens are always selected. In this case, responses for a given prompt are mostly deterministic, but a small amount of variation is still possible. If the model returns a response that’s too generic, too short, or the model gives a fallback response, try increasing the temperature. Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_TEMPERATURE`	double	`${quarkus.langchain4j.temperature}`
`quarkus.langchain4j.ai.gemini.chat-model.max-output-tokens` Maximum number of tokens that can be generated in the response. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words. Specify a lower value for shorter responses and a higher value for potentially longer responses. Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_MAX_OUTPUT_TOKENS`	int	`8192`
`quarkus.langchain4j.ai.gemini.chat-model.top-p` Top-P changes how the model selects tokens for output. Tokens are selected from the most (see top-K) to least probable until the sum of their probabilities equals the top-P value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the top-P value is 0.5, then the model will select either A or B as the next token by using temperature and excludes C as a candidate. Specify a lower value for less random responses and a higher value for more random responses. Range: 0.0 - 1.0 Default for gemini-2.5-flash: 0.95 Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_TOP_P`	double
`quarkus.langchain4j.ai.gemini.chat-model.top-k` Top-K changes how the model selects tokens for output. A top-K of 1 means the next selected token is the most probable among all tokens in the model’s vocabulary (also called greedy decoding), while a top-K of 3 means that the next token is selected from among the three most probable tokens by using temperature. For each token selection step, the top-K tokens with the highest probabilities are sampled. Then tokens are further filtered based on top-P with the final token selected using temperature sampling. Specify a lower value for less random responses and a higher value for more random responses. Range: 1-40 gemini-2.5-flash doesn’t support topK Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_TOP_K`	int
`quarkus.langchain4j.ai.gemini.chat-model.log-requests` Whether chat model requests should be logged Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.ai.gemini.chat-model.log-responses` Whether chat model responses should be logged Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.ai.gemini.chat-model.timeout` Global timeout for requests to gemini APIs Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_TIMEOUT`	Duration	`10s`
`quarkus.langchain4j.ai.gemini.chat-model.thinking.include-thoughts` Controls whether thought summaries are enabled. Thought summaries are synthesized versions of the model’s raw thoughts and offer insights into the model’s internal reasoning process. Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_THINKING_INCLUDE_THOUGHTS`	boolean	`false`
`quarkus.langchain4j.ai.gemini.chat-model.thinking.thinking-budget` The thinkingBudget parameter guides the model on the number of thinking tokens to use when generating a response. A higher token count generally allows for more detailed reasoning, which can be beneficial for tackling more complex tasks. If latency is more important, use a lower budget or disable thinking by setting thinkingBudget to 0. Setting the thinkingBudget to -1 turns on dynamic thinking, meaning the model will adjust the budget based on the complexity of the request. The thinkingBudget is only supported in Gemini 2.5 Flash, 2.5 Pro, and 2.5 Flash-Lite. Depending on the prompt, the model might overflow or underflow the token budget. See Gemini API docs for more details. Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_THINKING_THINKING_BUDGET`	long
`quarkus.langchain4j.ai.gemini.embedding-model.model-id` The id of the model to use. Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI_EMBEDDING_MODEL_MODEL_ID`	string	`text-embedding-004`
`quarkus.langchain4j.ai.gemini.embedding-model.output-dimension` Reduced dimension for the output embedding Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI_EMBEDDING_MODEL_OUTPUT_DIMENSION`	int
`quarkus.langchain4j.ai.gemini.embedding-model.task-type` Optional task type for which the embeddings will be used. Can only be set for models/embedding-001 Possible values: TASK_TYPE_UNSPECIFIED, RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING, QUESTION_ANSWERING, FACT_VERIFICATION Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI_EMBEDDING_MODEL_TASK_TYPE`	string
`quarkus.langchain4j.ai.gemini.embedding-model.log-requests` Whether chat model requests should be logged Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI_EMBEDDING_MODEL_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.ai.gemini.embedding-model.log-responses` Whether chat model responses should be logged Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI_EMBEDDING_MODEL_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.ai.gemini.embedding-model.timeout` Global timeout for requests to gemini APIs Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI_EMBEDDING_MODEL_TIMEOUT`	Duration	`10s`
Named model config	Type	Default
`quarkus.langchain4j.ai.gemini."model-name".api-key` The api key Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__API_KEY`	string
`quarkus.langchain4j.ai.gemini."model-name".publisher` Publisher of model Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__PUBLISHER`	string	`google`
`quarkus.langchain4j.ai.gemini."model-name".base-url` Meant to be used for testing only in order to override the base URL used by the client Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__BASE_URL`	string
`quarkus.langchain4j.ai.gemini."model-name".enable-integration` Whether to enable the integration. Defaults to `true`, which means requests are made to the Vertex AI Gemini provider. Set to `false` to disable all requests. Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__ENABLE_INTEGRATION`	boolean	`true`
`quarkus.langchain4j.ai.gemini."model-name".log-requests` Whether the Vertex AI client should log requests Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.ai.gemini."model-name".log-responses` Whether the Vertex AI client should log responses Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.ai.gemini."model-name".timeout` Timeout for requests to gemini APIs Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__TIMEOUT`	Duration	`${QUARKUS.LANGCHAIN4J.TIMEOUT}`
`quarkus.langchain4j.ai.gemini."model-name".chat-model.model-id` The id of the model to use. Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_MODEL_ID`	string	`gemini-2.5-flash`
`quarkus.langchain4j.ai.gemini."model-name".chat-model.temperature` The temperature is used for sampling during response generation, which occurs when topP and topK are applied. Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a less open-ended or creative response, while higher temperatures can lead to more diverse or creative results. A temperature of 0 means that the highest probability tokens are always selected. In this case, responses for a given prompt are mostly deterministic, but a small amount of variation is still possible. If the model returns a response that’s too generic, too short, or the model gives a fallback response, try increasing the temperature. Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_TEMPERATURE`	double	`${quarkus.langchain4j.temperature}`
`quarkus.langchain4j.ai.gemini."model-name".chat-model.max-output-tokens` Maximum number of tokens that can be generated in the response. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words. Specify a lower value for shorter responses and a higher value for potentially longer responses. Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_MAX_OUTPUT_TOKENS`	int	`8192`
`quarkus.langchain4j.ai.gemini."model-name".chat-model.top-p` Top-P changes how the model selects tokens for output. Tokens are selected from the most (see top-K) to least probable until the sum of their probabilities equals the top-P value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the top-P value is 0.5, then the model will select either A or B as the next token by using temperature and excludes C as a candidate. Specify a lower value for less random responses and a higher value for more random responses. Range: 0.0 - 1.0 Default for gemini-2.5-flash: 0.95 Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_TOP_P`	double
`quarkus.langchain4j.ai.gemini."model-name".chat-model.top-k` Top-K changes how the model selects tokens for output. A top-K of 1 means the next selected token is the most probable among all tokens in the model’s vocabulary (also called greedy decoding), while a top-K of 3 means that the next token is selected from among the three most probable tokens by using temperature. For each token selection step, the top-K tokens with the highest probabilities are sampled. Then tokens are further filtered based on top-P with the final token selected using temperature sampling. Specify a lower value for less random responses and a higher value for more random responses. Range: 1-40 gemini-2.5-flash doesn’t support topK Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_TOP_K`	int
`quarkus.langchain4j.ai.gemini."model-name".chat-model.log-requests` Whether chat model requests should be logged Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.ai.gemini."model-name".chat-model.log-responses` Whether chat model responses should be logged Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.ai.gemini."model-name".chat-model.timeout` Global timeout for requests to gemini APIs Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_TIMEOUT`	Duration	`10s`
`quarkus.langchain4j.ai.gemini."model-name".chat-model.thinking.include-thoughts` Controls whether thought summaries are enabled. Thought summaries are synthesized versions of the model’s raw thoughts and offer insights into the model’s internal reasoning process. Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_THINKING_INCLUDE_THOUGHTS`	boolean	`false`
`quarkus.langchain4j.ai.gemini."model-name".chat-model.thinking.thinking-budget` The thinkingBudget parameter guides the model on the number of thinking tokens to use when generating a response. A higher token count generally allows for more detailed reasoning, which can be beneficial for tackling more complex tasks. If latency is more important, use a lower budget or disable thinking by setting thinkingBudget to 0. Setting the thinkingBudget to -1 turns on dynamic thinking, meaning the model will adjust the budget based on the complexity of the request. The thinkingBudget is only supported in Gemini 2.5 Flash, 2.5 Pro, and 2.5 Flash-Lite. Depending on the prompt, the model might overflow or underflow the token budget. See Gemini API docs for more details. Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_THINKING_THINKING_BUDGET`	long
`quarkus.langchain4j.ai.gemini."model-name".embedding-model.model-id` The id of the model to use. Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__EMBEDDING_MODEL_MODEL_ID`	string	`text-embedding-004`
`quarkus.langchain4j.ai.gemini."model-name".embedding-model.output-dimension` Reduced dimension for the output embedding Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__EMBEDDING_MODEL_OUTPUT_DIMENSION`	int
`quarkus.langchain4j.ai.gemini."model-name".embedding-model.task-type` Optional task type for which the embeddings will be used. Can only be set for models/embedding-001 Possible values: TASK_TYPE_UNSPECIFIED, RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING, QUESTION_ANSWERING, FACT_VERIFICATION Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__EMBEDDING_MODEL_TASK_TYPE`	string
`quarkus.langchain4j.ai.gemini."model-name".embedding-model.log-requests` Whether chat model requests should be logged Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__EMBEDDING_MODEL_LOG_REQUESTS`	boolean	`false`
`quarkus.langchain4j.ai.gemini."model-name".embedding-model.log-responses` Whether chat model responses should be logged Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__EMBEDDING_MODEL_LOG_RESPONSES`	boolean	`false`
`quarkus.langchain4j.ai.gemini."model-name".embedding-model.timeout` Global timeout for requests to gemini APIs Environment variable: `QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__EMBEDDING_MODEL_TIMEOUT`	Duration	`10s`

Configuration property

Type

Default

quarkus.langchain4j.ai.gemini.chat-model.enabled

Whether the model should be enabled

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_ENABLED

boolean

true

quarkus.langchain4j.ai.gemini.embedding-model.enabled

Whether the model should be enabled

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_EMBEDDING_MODEL_ENABLED

boolean

true

quarkus.langchain4j.ai.gemini.api-key

The api key

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_API_KEY

string

quarkus.langchain4j.ai.gemini.publisher

Publisher of model

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_PUBLISHER

string

google

quarkus.langchain4j.ai.gemini.base-url

Meant to be used for testing only in order to override the base URL used by the client

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_BASE_URL

string

quarkus.langchain4j.ai.gemini.enable-integration

Whether to enable the integration. Defaults to true, which means requests are made to the Vertex AI Gemini provider. Set to false to disable all requests.

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_ENABLE_INTEGRATION

boolean

true

quarkus.langchain4j.ai.gemini.log-requests

Whether the Vertex AI client should log requests

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_LOG_REQUESTS

boolean

false

quarkus.langchain4j.ai.gemini.log-responses

Whether the Vertex AI client should log responses

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_LOG_RESPONSES

boolean

false

quarkus.langchain4j.ai.gemini.timeout

Timeout for requests to gemini APIs

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_TIMEOUT

Duration

${QUARKUS.LANGCHAIN4J.TIMEOUT}

quarkus.langchain4j.ai.gemini.chat-model.model-id

The id of the model to use.

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_MODEL_ID

string

gemini-2.5-flash

quarkus.langchain4j.ai.gemini.chat-model.temperature

The temperature is used for sampling during response generation, which occurs when topP and topK are applied. Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a less open-ended or creative response, while higher temperatures can lead to more diverse or creative results. A temperature of 0 means that the highest probability tokens are always selected. In this case, responses for a given prompt are mostly deterministic, but a small amount of variation is still possible.

If the model returns a response that’s too generic, too short, or the model gives a fallback response, try increasing the temperature.

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_TEMPERATURE

double

${quarkus.langchain4j.temperature}

quarkus.langchain4j.ai.gemini.chat-model.max-output-tokens

Maximum number of tokens that can be generated in the response. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words. Specify a lower value for shorter responses and a higher value for potentially longer responses.

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_MAX_OUTPUT_TOKENS

int

8192

quarkus.langchain4j.ai.gemini.chat-model.top-p

Top-P changes how the model selects tokens for output. Tokens are selected from the most (see top-K) to least probable until the sum of their probabilities equals the top-P value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the top-P value is 0.5, then the model will select either A or B as the next token by using temperature and excludes C as a candidate.

Specify a lower value for less random responses and a higher value for more random responses.

Range: 0.0 - 1.0

Default for gemini-2.5-flash: 0.95

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_TOP_P

double

quarkus.langchain4j.ai.gemini.chat-model.top-k

Top-K changes how the model selects tokens for output. A top-K of 1 means the next selected token is the most probable among all tokens in the model’s vocabulary (also called greedy decoding), while a top-K of 3 means that the next token is selected from among the three most probable tokens by using temperature.

For each token selection step, the top-K tokens with the highest probabilities are sampled. Then tokens are further filtered based on top-P with the final token selected using temperature sampling.

Specify a lower value for less random responses and a higher value for more random responses.

Range: 1-40

gemini-2.5-flash doesn’t support topK

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_TOP_K

int

quarkus.langchain4j.ai.gemini.chat-model.log-requests

Whether chat model requests should be logged

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_LOG_REQUESTS

boolean

false

quarkus.langchain4j.ai.gemini.chat-model.log-responses

Whether chat model responses should be logged

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_LOG_RESPONSES

boolean

false

quarkus.langchain4j.ai.gemini.chat-model.timeout

Global timeout for requests to gemini APIs

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_TIMEOUT

Duration

10s

quarkus.langchain4j.ai.gemini.chat-model.thinking.include-thoughts

Controls whether thought summaries are enabled. Thought summaries are synthesized versions of the model’s raw thoughts and offer insights into the model’s internal reasoning process.

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_THINKING_INCLUDE_THOUGHTS

boolean

false

quarkus.langchain4j.ai.gemini.chat-model.thinking.thinking-budget

The thinkingBudget parameter guides the model on the number of thinking tokens to use when generating a response. A higher token count generally allows for more detailed reasoning, which can be beneficial for tackling more complex tasks. If latency is more important, use a lower budget or disable thinking by setting thinkingBudget to 0. Setting the thinkingBudget to -1 turns on dynamic thinking, meaning the model will adjust the budget based on the complexity of the request.

The thinkingBudget is only supported in Gemini 2.5 Flash, 2.5 Pro, and 2.5 Flash-Lite. Depending on the prompt, the model might overflow or underflow the token budget. See Gemini API docs for more details.

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_THINKING_THINKING_BUDGET

long

quarkus.langchain4j.ai.gemini.embedding-model.model-id

The id of the model to use.

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_EMBEDDING_MODEL_MODEL_ID

string

text-embedding-004

quarkus.langchain4j.ai.gemini.embedding-model.output-dimension

Reduced dimension for the output embedding

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_EMBEDDING_MODEL_OUTPUT_DIMENSION

int

quarkus.langchain4j.ai.gemini.embedding-model.task-type

Optional task type for which the embeddings will be used. Can only be set for models/embedding-001 Possible values: TASK_TYPE_UNSPECIFIED, RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING, QUESTION_ANSWERING, FACT_VERIFICATION

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_EMBEDDING_MODEL_TASK_TYPE

string

quarkus.langchain4j.ai.gemini.embedding-model.log-requests

Whether chat model requests should be logged

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_EMBEDDING_MODEL_LOG_REQUESTS

boolean

false

quarkus.langchain4j.ai.gemini.embedding-model.log-responses

Whether chat model responses should be logged

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_EMBEDDING_MODEL_LOG_RESPONSES

boolean

false

quarkus.langchain4j.ai.gemini.embedding-model.timeout

Global timeout for requests to gemini APIs

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_EMBEDDING_MODEL_TIMEOUT

Duration

10s

Named model config

Type

Default

quarkus.langchain4j.ai.gemini."model-name".api-key

The api key

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__API_KEY

string

quarkus.langchain4j.ai.gemini."model-name".publisher

Publisher of model

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__PUBLISHER

string

google

quarkus.langchain4j.ai.gemini."model-name".base-url

Meant to be used for testing only in order to override the base URL used by the client

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__BASE_URL

string

quarkus.langchain4j.ai.gemini."model-name".enable-integration

Whether to enable the integration. Defaults to true, which means requests are made to the Vertex AI Gemini provider. Set to false to disable all requests.

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__ENABLE_INTEGRATION

boolean

true

quarkus.langchain4j.ai.gemini."model-name".log-requests

Whether the Vertex AI client should log requests

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__LOG_REQUESTS

boolean

false

quarkus.langchain4j.ai.gemini."model-name".log-responses

Whether the Vertex AI client should log responses

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__LOG_RESPONSES

boolean

false

quarkus.langchain4j.ai.gemini."model-name".timeout

Timeout for requests to gemini APIs

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__TIMEOUT

Duration

${QUARKUS.LANGCHAIN4J.TIMEOUT}

quarkus.langchain4j.ai.gemini."model-name".chat-model.model-id

The id of the model to use.

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_MODEL_ID

string

gemini-2.5-flash

quarkus.langchain4j.ai.gemini."model-name".chat-model.temperature

If the model returns a response that’s too generic, too short, or the model gives a fallback response, try increasing the temperature.

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_TEMPERATURE

double

${quarkus.langchain4j.temperature}

quarkus.langchain4j.ai.gemini."model-name".chat-model.max-output-tokens

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_MAX_OUTPUT_TOKENS

int

8192

quarkus.langchain4j.ai.gemini."model-name".chat-model.top-p

Specify a lower value for less random responses and a higher value for more random responses.

Range: 0.0 - 1.0

Default for gemini-2.5-flash: 0.95

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_TOP_P

double

quarkus.langchain4j.ai.gemini."model-name".chat-model.top-k

For each token selection step, the top-K tokens with the highest probabilities are sampled. Then tokens are further filtered based on top-P with the final token selected using temperature sampling.

Specify a lower value for less random responses and a higher value for more random responses.

Range: 1-40

gemini-2.5-flash doesn’t support topK

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_TOP_K

int

quarkus.langchain4j.ai.gemini."model-name".chat-model.log-requests

Whether chat model requests should be logged

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_LOG_REQUESTS

boolean

false

quarkus.langchain4j.ai.gemini."model-name".chat-model.log-responses

Whether chat model responses should be logged

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_LOG_RESPONSES

boolean

false

quarkus.langchain4j.ai.gemini."model-name".chat-model.timeout

Global timeout for requests to gemini APIs

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_TIMEOUT

Duration

10s

quarkus.langchain4j.ai.gemini."model-name".chat-model.thinking.include-thoughts

Controls whether thought summaries are enabled. Thought summaries are synthesized versions of the model’s raw thoughts and offer insights into the model’s internal reasoning process.

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_THINKING_INCLUDE_THOUGHTS

boolean

false

quarkus.langchain4j.ai.gemini."model-name".chat-model.thinking.thinking-budget

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_THINKING_THINKING_BUDGET

long

quarkus.langchain4j.ai.gemini."model-name".embedding-model.model-id

The id of the model to use.

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__EMBEDDING_MODEL_MODEL_ID

string

text-embedding-004

quarkus.langchain4j.ai.gemini."model-name".embedding-model.output-dimension

Reduced dimension for the output embedding

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__EMBEDDING_MODEL_OUTPUT_DIMENSION

int

quarkus.langchain4j.ai.gemini."model-name".embedding-model.task-type

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__EMBEDDING_MODEL_TASK_TYPE

string

quarkus.langchain4j.ai.gemini."model-name".embedding-model.log-requests

Whether chat model requests should be logged

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__EMBEDDING_MODEL_LOG_REQUESTS

boolean

false

quarkus.langchain4j.ai.gemini."model-name".embedding-model.log-responses

Whether chat model responses should be logged

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__EMBEDDING_MODEL_LOG_RESPONSES

boolean

false

quarkus.langchain4j.ai.gemini."model-name".embedding-model.timeout

Global timeout for requests to gemini APIs

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__EMBEDDING_MODEL_TIMEOUT

Duration

10s

About the Duration format

To write duration values, use the standard java.time.Duration format. See the Duration#parse() Java API documentation for more information.

You can also use a simplified format, starting with a number:

If the value is only a number, it represents time in seconds.
If the value is a number followed by ms, it represents time in milliseconds.

In other cases, the simplified format is translated to the java.time.Duration format for parsing:

If the value is a number followed by h, m, or s, it is prefixed with PT.
If the value is a number followed by d, it is prefixed with P.