Jlama
Prerequisites
To use Jlama it is necessary to run on Java 21 or later. This is because it utilizes the new Vector API for faster inference. Note that the Vector API is still a Java preview features, so it is required to explicitly enable it.
Since the Vector API are still a preview feature in Java 21, and up to the latest Java 23, it is necessary to enable it on the JVM by launching it with the following flags:
--enable-preview --enable-native-access=ALL-UNNAMED --add-modules jdk.incubator.vector
Dev Mode
Quarkus LangChain4j automatically handles the pulling of the models configured by the application, so there is no need for users to do so manually. Furthermore, the extension properly configures the launch of Java process in order to ensure that the C2 compiler will be enabled (as without it, Jlama is virtually unusable).
Models are generally very large and can take time to download while also consuming a large chunk of disk space. Model location can be controlled using quarkus.langchain4j.jlama.models-path property.
|
Using Jlama
To let Jlama running inference on your models, add the following dependency into your project:
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-jlama</artifactId>
<version>0.25.0.CR1</version>
</dependency>
If no other LLM extension is installed, AI Services will automatically utilize the configured Jlama model.
By default, the extension uses as model TinyLlama-1.1B-Chat-v1.0-Jlama-Q4
.
You can change it by setting the quarkus.langchain4j.jlama.chat-model.model-name
property in the application.properties
file:
quarkus.langchain4j.jlama.chat-model.model-name=tjake/granite-3.0-2b-instruct-JQ4
Configuration
Several configuration properties are available:
Configuration property fixed at build time - All other configuration properties are overridable at runtime
Configuration property |
Type |
Default |
---|---|---|
Determines whether the necessary Jlama models are downloaded and included in the jar at build time. Currently, this option is only valid for Environment variable: |
boolean |
|
Whether the model should be enabled Environment variable: |
boolean |
|
Whether the model should be enabled Environment variable: |
boolean |
|
Model name to use Environment variable: |
string |
|
Model name to use Environment variable: |
string |
|
Location on the file-system which serves as a cache for the models Environment variable: |
path |
|
What sampling temperature to use, between 0.0 and 1.0. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. It is generally recommended to set this or the Environment variable: |
double |
|
The maximum number of tokens to generate in the completion. The token count of your prompt plus Environment variable: |
int |
|
Whether to enable the integration. Set to Environment variable: |
boolean |
|
Whether Jlama should log requests Environment variable: |
boolean |
|
Whether Jlama client should log responses Environment variable: |
boolean |
|
Type |
Default |
|
Model name to use Environment variable: |
string |
|
Model name to use Environment variable: |
string |
|
What sampling temperature to use, between 0.0 and 1.0. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. It is generally recommended to set this or the Environment variable: |
double |
|
The maximum number of tokens to generate in the completion. The token count of your prompt plus Environment variable: |
int |
|
Whether to enable the integration. Set to Environment variable: |
boolean |
|
Whether Jlama should log requests Environment variable: |
boolean |
|
Whether Jlama client should log responses Environment variable: |
boolean |
|
Document Retriever and Embedding
Jlama also provides embedding models.
By default, it uses intfloat/e5-small-v2
.
You can change the default embedding model by setting the quarkus.langchain4j.jlama.embedding-model.model-name
property in the application.properties
file:
quarkus.langchain4j.log-requests=true
quarkus.langchain4j.log-responses=true
quarkus.langchain4j.jlama.chat-model.model-name=tjake/granite-3.0-2b-instruct-JQ4
quarkus.langchain4j.jlama.embedding-model.model-name=intfloat/e5-small-v2
If no other LLM extension is installed, retrieve the embedding model as follows:
@Inject EmbeddingModel model; // Injects the embedding model