Gemini Embedding Model

Gemini is a simpler platform designed for a broader audience than Vertex AI Gemini, including non-technical users. It is a good first step for developers to get started with Gemini models.

Using Gemini Embedding Models

The Gemini platform also provides an embedding model suitable for transforming input text into vector representations.

To use it, first add the required dependency to your project:

<dependency>
    <groupId>io.quarkiverse.langchain4j</groupId>
    <artifactId>quarkus-langchain4j-ai-gemini</artifactId>
    <version>1.0.2</version>
</dependency>

If no other embedding model is configured, AI Services will automatically use the Gemini embedding model.

To inject the embedding model:

@Inject
EmbeddingModel embeddingModel;

Configuration

Set the API key in your application.properties:

quarkus.langchain4j.ai.gemini.api-key=...
Alternatively, use the QUARKUS_LANGCHAIN4J_AI_GEMINI_API_KEY environment variable.

Several configuration properties are available:

Configuration property fixed at build time - All other configuration properties are overridable at runtime

Configuration property

Type

Default

Whether the model should be enabled

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_ENABLED

boolean

true

Whether the model should be enabled

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_EMBEDDING_MODEL_ENABLED

boolean

true

The api key

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_API_KEY

string

Publisher of model

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_PUBLISHER

string

google

Meant to be used for testing only in order to override the base URL used by the client

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_BASE_URL

string

Whether to enable the integration. Defaults to true, which means requests are made to the Vertex AI Gemini provider. Set to false to disable all requests.

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_ENABLE_INTEGRATION

boolean

true

Whether the Vertex AI client should log requests

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_LOG_REQUESTS

boolean

false

Whether the Vertex AI client should log responses

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_LOG_RESPONSES

boolean

false

Timeout for requests to gemini APIs

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_TIMEOUT

Duration 

${QUARKUS.LANGCHAIN4J.TIMEOUT}

The id of the model to use.

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_MODEL_ID

string

gemini-1.5-flash

The temperature is used for sampling during response generation, which occurs when topP and topK are applied. Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a less open-ended or creative response, while higher temperatures can lead to more diverse or creative results. A temperature of 0 means that the highest probability tokens are always selected. In this case, responses for a given prompt are mostly deterministic, but a small amount of variation is still possible.

If the model returns a response that’s too generic, too short, or the model gives a fallback response, try increasing the temperature.

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_TEMPERATURE

double

${quarkus.langchain4j.temperature}

Maximum number of tokens that can be generated in the response. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words. Specify a lower value for shorter responses and a higher value for potentially longer responses.

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_MAX_OUTPUT_TOKENS

int

8192

Top-P changes how the model selects tokens for output. Tokens are selected from the most (see top-K) to least probable until the sum of their probabilities equals the top-P value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the top-P value is 0.5, then the model will select either A or B as the next token by using temperature and excludes C as a candidate.

Specify a lower value for less random responses and a higher value for more random responses.

Range: 0.0 - 1.0

gemini-1.0-pro and gemini-1.5-pro don’t support topK

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_TOP_P

double

Top-K changes how the model selects tokens for output. A top-K of 1 means the next selected token is the most probable among all tokens in the model’s vocabulary (also called greedy decoding), while a top-K of 3 means that the next token is selected from among the three most probable tokens by using temperature.

For each token selection step, the top-K tokens with the highest probabilities are sampled. Then tokens are further filtered based on top-P with the final token selected using temperature sampling.

Specify a lower value for less random responses and a higher value for more random responses.

Range: 1-40

Default for gemini-1.5-pro: 0.94

Default for gemini-1.0-pro: 1

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_TOP_K

int

Whether chat model requests should be logged

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_LOG_REQUESTS

boolean

false

Whether chat model responses should be logged

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_LOG_RESPONSES

boolean

false

Global timeout for requests to gemini APIs

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_CHAT_MODEL_TIMEOUT

Duration 

10s

The id of the model to use.

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_EMBEDDING_MODEL_MODEL_ID

string

text-embedding-004

Reduced dimension for the output embedding

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_EMBEDDING_MODEL_OUTPUT_DIMENSION

int

Optional task type for which the embeddings will be used. Can only be set for models/embedding-001 Possible values: TASK_TYPE_UNSPECIFIED, RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING, QUESTION_ANSWERING, FACT_VERIFICATION

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_EMBEDDING_MODEL_TASK_TYPE

string

Whether chat model requests should be logged

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_EMBEDDING_MODEL_LOG_REQUESTS

boolean

false

Whether chat model responses should be logged

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_EMBEDDING_MODEL_LOG_RESPONSES

boolean

false

Global timeout for requests to gemini APIs

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI_EMBEDDING_MODEL_TIMEOUT

Duration 

10s

Named model config

Type

Default

The api key

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__API_KEY

string

Publisher of model

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__PUBLISHER

string

google

Meant to be used for testing only in order to override the base URL used by the client

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__BASE_URL

string

Whether to enable the integration. Defaults to true, which means requests are made to the Vertex AI Gemini provider. Set to false to disable all requests.

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__ENABLE_INTEGRATION

boolean

true

Whether the Vertex AI client should log requests

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__LOG_REQUESTS

boolean

false

Whether the Vertex AI client should log responses

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__LOG_RESPONSES

boolean

false

Timeout for requests to gemini APIs

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__TIMEOUT

Duration 

${QUARKUS.LANGCHAIN4J.TIMEOUT}

The id of the model to use.

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_MODEL_ID

string

gemini-1.5-flash

The temperature is used for sampling during response generation, which occurs when topP and topK are applied. Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a less open-ended or creative response, while higher temperatures can lead to more diverse or creative results. A temperature of 0 means that the highest probability tokens are always selected. In this case, responses for a given prompt are mostly deterministic, but a small amount of variation is still possible.

If the model returns a response that’s too generic, too short, or the model gives a fallback response, try increasing the temperature.

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_TEMPERATURE

double

${quarkus.langchain4j.temperature}

Maximum number of tokens that can be generated in the response. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words. Specify a lower value for shorter responses and a higher value for potentially longer responses.

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_MAX_OUTPUT_TOKENS

int

8192

Top-P changes how the model selects tokens for output. Tokens are selected from the most (see top-K) to least probable until the sum of their probabilities equals the top-P value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the top-P value is 0.5, then the model will select either A or B as the next token by using temperature and excludes C as a candidate.

Specify a lower value for less random responses and a higher value for more random responses.

Range: 0.0 - 1.0

gemini-1.0-pro and gemini-1.5-pro don’t support topK

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_TOP_P

double

Top-K changes how the model selects tokens for output. A top-K of 1 means the next selected token is the most probable among all tokens in the model’s vocabulary (also called greedy decoding), while a top-K of 3 means that the next token is selected from among the three most probable tokens by using temperature.

For each token selection step, the top-K tokens with the highest probabilities are sampled. Then tokens are further filtered based on top-P with the final token selected using temperature sampling.

Specify a lower value for less random responses and a higher value for more random responses.

Range: 1-40

Default for gemini-1.5-pro: 0.94

Default for gemini-1.0-pro: 1

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_TOP_K

int

Whether chat model requests should be logged

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_LOG_REQUESTS

boolean

false

Whether chat model responses should be logged

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_LOG_RESPONSES

boolean

false

Global timeout for requests to gemini APIs

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__CHAT_MODEL_TIMEOUT

Duration 

10s

The id of the model to use.

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__EMBEDDING_MODEL_MODEL_ID

string

text-embedding-004

Reduced dimension for the output embedding

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__EMBEDDING_MODEL_OUTPUT_DIMENSION

int

Optional task type for which the embeddings will be used. Can only be set for models/embedding-001 Possible values: TASK_TYPE_UNSPECIFIED, RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING, QUESTION_ANSWERING, FACT_VERIFICATION

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__EMBEDDING_MODEL_TASK_TYPE

string

Whether chat model requests should be logged

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__EMBEDDING_MODEL_LOG_REQUESTS

boolean

false

Whether chat model responses should be logged

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__EMBEDDING_MODEL_LOG_RESPONSES

boolean

false

Global timeout for requests to gemini APIs

Environment variable: QUARKUS_LANGCHAIN4J_AI_GEMINI__MODEL_NAME__EMBEDDING_MODEL_TIMEOUT

Duration 

10s

About the Duration format

To write duration values, use the standard java.time.Duration format. See the Duration#parse() Java API documentation for more information.

You can also use a simplified format, starting with a number:

  • If the value is only a number, it represents time in seconds.

  • If the value is a number followed by ms, it represents time in milliseconds.

In other cases, the simplified format is translated to the java.time.Duration format for parsing:

  • If the value is a number followed by h, m, or s, it is prefixed with PT.

  • If the value is a number followed by d, it is prefixed with P.