Agent and Tools

The core premise of agents involves employing a language model to discern a sequence of actions. Unlike chains, where actions are hardcoded, agents utilize a language model as a reasoning engine to determine the actions (tools) and their sequence.

Concepts

Agent

The agent is the core decision-maker responsible for determining the subsequent steps. It’s driven by a language model and a prompt. The inputs include:

  • Tools: Descriptions of available actions

  • User Input: The high-level objective

  • Intermediate Steps: Previous (action, tool output) pairs executed in the order to fulfill the user input

The output presents the next action(s) to take or the final response to be delivered to the user. Each action specifies a tool (method name) and its corresponding input parameters.

Tools

Tools are the methods an agent can invoke. Two elements are essential when using tools:

  • Granting the agent access to the right tools.

  • Describing tools in a manner most beneficial to the agent.

Failure to consider both aspects may hinder the agent’s functionality. Access to an incorrect set of tools impedes the agent from achieving objectives, while improperly described tools hamper their effective utilization.

Recommendations for Agent Construction

When constructing an agent, consider:

  • Setting the model temperature to 0 for the agent to consistently choose the most probable action.

  • Ensuring well-detailed descriptions of tools.

  • Listing steps in the prompt in the desired execution order.

Declaring a Tool

A tool denotes a method accessible by an agent. It must be part of a CDI bean and have at least one method annotated with @Tool:

// Example of an EmailService tool method
package io.quarkiverse.langchain4j.sample;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

import dev.langchain4j.agent.tool.Tool;
import io.quarkus.logging.Log;
import io.quarkus.mailer.Mail;
import io.quarkus.mailer.Mailer;

@ApplicationScoped
public class EmailService {

    @Inject
    Mailer mailer;

    @Tool("send the provided content via email")
    public void sendAnEmail(String content) {
        Log.info("Sending an email: " + content);
        mailer.send(Mail.withText("sendMeALetter@quarkus.io", "A poem for you", content));
    }

}
A CDI bean can host multiple methods annotated with @Tool. However, ensure each method name is unique among all declared tools.

Providing Tool Access

In the AI Service, you can specify the tools accessible to the agent. By default, no tools are available. Hence, ensure to define the list of tools you wish to make accessible:

@RegisterAiService(tools = EmailService.class)
public interface MyAiService {
    // ...
}

When employing tools, configuring the memory provider is necessary. Since tools necessitate a sequence of messages, maintaining this conversation is essential. A minimum memory of three messages is necessary for optimal tool functionality.

If a method of an AI Service needs to use tools other than the ones configured of the tools property of @RegisterAiService, then that method can be annotated with @Toolbox.

In this case @Toolbox completely overrides @RegisterAiService(tools=…​)

Tools execution model

Tools can have different execution models:

  • blocking - the tools execution blocks the caller thread

  • non-blocking - the tools execution is asynchronous and does not block the caller thread

  • _run on a virtual threads - the tools execution is asynchronous and runs on a new virtual thread

The execution model is configured using the @Blocking, @NonBlocking and @RunOnVirtualThread annotations. In addition, tool methods returning CompletionStage or Uni are considered non-blocking (if no annotations are used).

For example, the following tool is considered blocking:

@Tool("get the customer name for the given customerId")
public String getCustomerName(long id) {
    return find("id", id).firstResult().name;
}

The following tool is considered non-blocking:

@Tool("add a and b")
@NonBlocking
int sum(int a, int b) {
    return a + b;
}

This other tool is considered non-blocking as well:

@Tool("get the customer name for the given customerId")
public Uni<String> getCustomerName(long id) {
    // ...
}

The following tool runs on a virtual thread:

@Tool("get the customer name for the given customerId")
@RunOnVirtualThread
public String getCustomerName(long id) {
    return find("id", id).firstResult().name;
}

Streaming

The execution model is particularly important when using streamed response. Indeed, streamed response are executed on the event loop, which cannot be blocked. Thus, in this case, based on the execution model of the tool, Quarkus LangChain4J will automatically switch to a worker thread to execute the tool. This mechanism allows to avoid blocking the event loop, while still allowing to use blocking tools (like database access).

For example, let’s imagine the following AI service method:

@UserMessage("...")
@ToolBox({TransactionRepository.class, CustomerRepository.class})
Multi<String> detectAmountFraudForCustomerStreamed(long customerId);

This method returns a stream of tokens. Each token is emitted on the event loop. The given tools are blocking (they access a database using a blocking manner). Thus, Quarkus LangChain4J will automatically switch to a worker thread before invoking the tools, to avoid blocking the event loop.

Request scope

When the request scope is active when invoking the LLM, the tool execution is also bound to the request scope. It means that you can propagate data to the tools execution, for example, a security context or a transaction context.

The propagation also works with virtual thread or when Quarkus LangChain4J automatically switches to a worker thread.

Dynamically Use Tools at Runtime

By default, tools in Quarkus are static, defined during the build phase. However, if you want to dynamically change the tools available at runtime, you can use a ToolProvider. This allows you to decide which tools to provide on the fly.

You can configure an AI service with a dynamic ToolProvider like this:

@RegisterAiService(toolProviderSupplier = MyToolProviderSupplier.class)
public interface MyAiService {
    // Service methods
}

Implementing a ToolProvider

To support dynamic tool selection, you’ll need to create a ToolProviderSupplier, which must be marked as @ApplicationScoped. Here’s an example:

@ApplicationScoped
public class MyToolProviderSupplier implements Supplier<ToolProvider> {
    @Inject
    MyCustomToolProvider myCustomToolProvider;

    @Override
    public ToolProvider get() {
        return myCustomToolProvider;
    }
}

Example: Conditional Tool Provision

In this example, a specific tool (e.g., a booking tool) is only provided if the user’s message contains the keyword "booking".

@ApplicationScoped
public class MyCustomToolProvider implements ToolProvider {
    @Inject
    BookingTool bookingTool;

    @Override
    public ToolProviderResult provideTools(ToolProviderRequest request) {
        boolean containsBooking = request.userMessage().singleText().contains("booking");
        if (!containsBooking) {
            return ToolProviderResult.builder().build();  // No tools
        }
        return buildToolProviderResult(List.of(bookingTool));  // Provide booking tool
    }

    private ToolProviderResult buildToolProviderResult(List<Object> toolObjects) {
        List<ToolSpecification> specifications = new ArrayList<>();
        Map<String, ToolExecutor> executors = new HashMap<>();
        Map<ToolSpecification, ToolExecutor> tools = new HashMap<>();

        // Populate tool metadata
        ToolsRecorder.populateToolMetadata(toolObjects, specifications, executors);

        // Map specifications to executors
        for (ToolSpecification specification : specifications) {
            tools.put(specification, executors.get(specification.name()));
        }

        // Return the result
        return ToolProviderResult.builder()
                .addAll(tools)
                .build();
    }
}

Alternative: Using ToolSpecification and ToolExecutor

If you prefer more control, you can work directly with ToolSpecification and ToolExecutor to provide tools.

Important Rule

You can configure either static tools (via tools) or a dynamic ToolProvider (via toolProviderSupplier), but not both in the same AI service.

How do tools work?

The question of how tools work naturally arises given the fact that no code needs to be written that wires up their usage.

The short answer is by providing the proper user and system messages and tool descriptions, the extension is able to craft API requests that allow the LLM to respond back with the proper tool invocations - but it is the extension itself then proceeds to invoke the tool. The result of the invocation is then used by the extension to make another request to the LLM which will then response with a new answer, potentially leading to another round of a tool invocation and the crafting of a new request to the LLM.

Detailed example

To help clarify the various interactions, let’s use the following example where we are creating a very simple calculator chatbot that responds to user requests over HTTP, leveraging the quarkus-langchain4j-openai extension. Here is the full source code:

import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

import jakarta.annotation.PreDestroy;
import jakarta.enterprise.context.RequestScoped;
import jakarta.inject.Singleton;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;

import org.jboss.resteasy.reactive.RestQuery;

import dev.langchain4j.agent.tool.Tool;
import dev.langchain4j.memory.ChatMemory;
import dev.langchain4j.memory.chat.ChatMemoryProvider;
import io.quarkiverse.langchain4j.RegisterAiService;
import io.quarkiverse.langchain4j.RemovableChatMemoryProvider;

@Path("assistant-with-tool")
public class AssistantWithToolsResource {

    private final Assistant assistant;

    public AssistantWithToolsResource(Assistant assistant) {
        this.assistant = assistant;
    }

    @GET  (3)
    public String get(@RestQuery String message) {
        return assistant.chat(message);
    }

    @Singleton (1)
    public static class Calculator {

        @Tool("Calculates the length of a string")
        int stringLength(String s) {
            return s.length();
        }

        @Tool("Calculates the sum of two numbers")
        int add(int a, int b) {
            return a + b;
        }

        @Tool("Calculates the square root of a number")
        double sqrt(int x) {
            return Math.sqrt(x);
        }
    }

    @RegisterAiService(tools = Calculator.class) (2)
    public interface Assistant {

        String chat(String userMessage);
    }
}
1 Declare a CDI bean that provides three different tools
2 Register an AiService that responds to a user’s request and has access to the calculator tools. This service is also able to keep track of the session’s messages using the CDI message store declared above.
3 Declare an HTTP endpoint that retrieves the user’s question via a query parameter and simply responds with chatbot’s response

Now, if we ask the chatbot What is the square root of the sum of the numbers of letters in the words "hello" and "world" via:

curl 'http://localhost:8080/assistant-with-tool?message=What%20is%20the%20square%20root%20of%20the%20sum%20of%20the%20numbers%20of%20letters%20in%20the%20words%20%22hello%22%20and%20%22world%22'

The response will be The square root of the sum of the numbers of letters in the words "hello" and "world" is approximately 3.162.

What is interesting however in understanding how the tools were used, is seeing the requests the extension has made to OpenAI and the responses the latter provides.

The application makes a first REST call to OpenAI that contains the following JSON payload:

{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "user",
      "content": "What is the square root of the sum of the numbers of letters in the words \"hello\" and \"world\""
    }
  ],
  "temperature": 1.0,
  "top_p": 1.0,
  "presence_penalty": 0.0,
  "frequency_penalty": 0.0,
  "functions": [
    {
      "name": "stringLength",
      "description": "Calculates the length of a string",
      "parameters": {
        "type": "object",
        "properties": {
          "s": {
            "type": "string"
          }
        },
        "required": [
          "s"
        ]
      }
    },
    {
      "name": "add",
      "description": "Calculates the sum of two numbers",
      "parameters": {
        "type": "object",
        "properties": {
          "a": {
            "type": "integer"
          },
          "b": {
            "type": "integer"
          }
        },
        "required": [
          "a",
          "b"
        ]
      }
    },
    {
      "name": "sqrt",
      "description": "Calculates the square root of a number",
      "parameters": {
        "type": "object",
        "properties": {
          "x": {
            "type": "integer"
          }
        },
        "required": [
          "x"
        ]
      }
    }
  ]
}

There are two things to notice in this JSON payload:

  • The user message which contains the user’s question (the messages part of the request)

  • The declaration of the three available functions (the functions part of the request) which directly map to the tools we declared in our source code.

The response from OpenAI is the following:

{
  "id": "chatcmpl-8L3uhmaMARLTbpGJGDvFW4pAPxM8V",
  "object": "chat.completion",
  "created": 1700030623,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "function_call": {
          "name": "stringLength",
          "arguments": "{\n  \"s\": \"hello\"\n}"
        }
      },
      "finish_reason": "function_call"
    }
  ],
  "usage": {
    "prompt_tokens": 118,
    "completion_tokens": 15,
    "total_tokens": 133
  }
}

This response initially looks odd as it does not look like OpenAI has responded to user’s question. However, upon closer inspection we see that the response contains something very interesting and that is the function_call part. In this part OpenAI is essentially telling the extension to invoke the stringLength tool with hello as the sole parameter. This is where the power of the LLM is evident - it has essentially figured out that to answer the user’s question, it first needs to calculate the length of "hello".

Upon receiving this response, the extension invokes calculator.stringLength("hello") and then sends the following new request to OpenAI:

{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "user",
      "content": "What is the square root of the sum of the numbers of letters in the words \"hello\" and \"world\""
    },
    {
      "role": "assistant",
      "content": null,
      "function_call": {
        "name": "stringLength",
        "arguments": "{\n  \"s\": \"hello\"\n}"
      }
    },
    {
      "role": "function",
      "content": "5",
      "name": "stringLength"
    }
  ],
  "temperature": 1.0,
  "top_p": 1.0,
  "presence_penalty": 0.0,
  "frequency_penalty": 0.0,
  "functions": [
    {
      "name": "stringLength",
      "description": "Calculates the length of a string",
      "parameters": {
        "type": "object",
        "properties": {
          "s": {
            "type": "string"
          }
        },
        "required": [
          "s"
        ]
      }
    },
    {
      "name": "add",
      "description": "Calculates the sum of two numbers",
      "parameters": {
        "type": "object",
        "properties": {
          "a": {
            "type": "integer"
          },
          "b": {
            "type": "integer"
          }
        },
        "required": [
          "a",
          "b"
        ]
      }
    },
    {
      "name": "sqrt",
      "description": "Calculates the square root of a number",
      "parameters": {
        "type": "object",
        "properties": {
          "x": {
            "type": "integer"
          }
        },
        "required": [
          "x"
        ]
      }
    }
  ]
}

What is very important to notice here is that the extension is sending not only has the user’s question, but also the LLM’s first response requesting a tool invocation, along with a new message containing the result of the said invocation. Recall that LLMs are stateless, therefore the entire message history needs to be sent each time - this is the reason why the use of tools requires a chat memory.

OpenAI now responds with the following:

{
  "id": "chatcmpl-8L3ukBR1Pblqn4g5m3JocHQVzYqMy",
  "object": "chat.completion",
  "created": 1700030626,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "function_call": {
          "name": "stringLength",
          "arguments": "{\n  \"s\": \"world\"\n}"
        }
      },
      "finish_reason": "function_call"
    }
  ],
  "usage": {
    "prompt_tokens": 142,
    "completion_tokens": 15,
    "total_tokens": 157
  }
}

Once again we see the LLM can not yet answer the original user’s question, but instead instructs the extension to invoke a tool.

As you’d expect by now, the extension does just that and sends the following new request to OpenAI:

{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "user",
      "content": "What is the square root of the sum of the numbers of letters in the words \"hello\" and \"world\""
    },
    {
      "role": "assistant",
      "content": null,
      "function_call": {
        "name": "stringLength",
        "arguments": "{\n  \"s\": \"hello\"\n}"
      }
    },
    {
      "role": "function",
      "content": "5",
      "name": "stringLength"
    },
    {
      "role": "assistant",
      "content": null,
      "function_call": {
        "name": "stringLength",
        "arguments": "{\n  \"s\": \"world\"\n}"
      }
    },
    {
      "role": "function",
      "content": "5",
      "name": "stringLength"
    }
  ],
  "temperature": 1.0,
  "top_p": 1.0,
  "presence_penalty": 0.0,
  "frequency_penalty": 0.0,
  "functions": [
    {
      "name": "stringLength",
      "description": "Calculates the length of a string",
      "parameters": {
        "type": "object",
        "properties": {
          "s": {
            "type": "string"
          }
        },
        "required": [
          "s"
        ]
      }
    },
    {
      "name": "add",
      "description": "Calculates the sum of two numbers",
      "parameters": {
        "type": "object",
        "properties": {
          "a": {
            "type": "integer"
          },
          "b": {
            "type": "integer"
          }
        },
        "required": [
          "a",
          "b"
        ]
      }
    },
    {
      "name": "sqrt",
      "description": "Calculates the square root of a number",
      "parameters": {
        "type": "object",
        "properties": {
          "x": {
            "type": "integer"
          }
        },
        "required": [
          "x"
        ]
      }
    }
  ]
}

This new request, in addition to the previous one sent to OpenAI, contains the result of the invocation of calculator.stringLength("world").

In turn, OpenAI responds with:

{
  "id": "chatcmpl-8L3uopZa8wdBmGWoWBqWbCO8UdwVv",
  "object": "chat.completion",
  "created": 1700030630,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "function_call": {
          "name": "add",
          "arguments": "{\n  \"a\": 5,\n  \"b\": 5\n}"
        }
      },
      "finish_reason": "function_call"
    }
  ],
  "usage": {
    "prompt_tokens": 166,
    "completion_tokens": 21,
    "total_tokens": 187
  }
}

This time the LLM instructed the extension that it is time to invoke the add tool using 5 and 5 as arguments - these arguments of course are the results of the previous two stringLength invocations.

The extension upon receiving this response invoked calculator.add(5,5) and then crafts and sends the following message to OpenAI:

{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "user",
      "content": "What is the square root of the sum of the numbers of letters in the words \"hello\" and \"world\""
    },
    {
      "role": "assistant",
      "content": null,
      "function_call": {
        "name": "stringLength",
        "arguments": "{\n  \"s\": \"hello\"\n}"
      }
    },
    {
      "role": "function",
      "content": "5",
      "name": "stringLength"
    },
    {
      "role": "assistant",
      "content": null,
      "function_call": {
        "name": "stringLength",
        "arguments": "{\n  \"s\": \"world\"\n}"
      }
    },
    {
      "role": "function",
      "content": "5",
      "name": "stringLength"
    },
    {
      "role": "assistant",
      "content": null,
      "function_call": {
        "name": "add",
        "arguments": "{\n  \"a\": 5,\n  \"b\": 5\n}"
      }
    },
    {
      "role": "function",
      "content": "10",
      "name": "add"
    }
  ],
  "temperature": 1.0,
  "top_p": 1.0,
  "presence_penalty": 0.0,
  "frequency_penalty": 0.0,
  "functions": [
    {
      "name": "stringLength",
      "description": "Calculates the length of a string",
      "parameters": {
        "type": "object",
        "properties": {
          "s": {
            "type": "string"
          }
        },
        "required": [
          "s"
        ]
      }
    },
    {
      "name": "add",
      "description": "Calculates the sum of two numbers",
      "parameters": {
        "type": "object",
        "properties": {
          "a": {
            "type": "integer"
          },
          "b": {
            "type": "integer"
          }
        },
        "required": [
          "a",
          "b"
        ]
      }
    },
    {
      "name": "sqrt",
      "description": "Calculates the square root of a number",
      "parameters": {
        "type": "object",
        "properties": {
          "x": {
            "type": "integer"
          }
        },
        "required": [
          "x"
        ]
      }
    }
  ]
}

The response from OpenAI is:

{
  "id": "chatcmpl-8L3urz1QWWjh3Dzmjqu0O3hK16NUE",
  "object": "chat.completion",
  "created": 1700030633,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "function_call": {
          "name": "sqrt",
          "arguments": "{\n  \"x\": 10\n}"
        }
      },
      "finish_reason": "function_call"
    }
  ],
  "usage": {
    "prompt_tokens": 195,
    "completion_tokens": 14,
    "total_tokens": 209
  }
}

The LLM this time decided that the sqrt tool needs to be invoked.

The extension following this instruction invokes calculator.sqrt(10) (10 being the result of the previous add invocation) then crafts and sends the following message to OpenAI:

{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "user",
      "content": "What is the square root of the sum of the numbers of letters in the words \"hello\" and \"world\""
    },
    {
      "role": "assistant",
      "content": null,
      "function_call": {
        "name": "stringLength",
        "arguments": "{\n  \"s\": \"hello\"\n}"
      }
    },
    {
      "role": "function",
      "content": "5",
      "name": "stringLength"
    },
    {
      "role": "assistant",
      "content": null,
      "function_call": {
        "name": "stringLength",
        "arguments": "{\n  \"s\": \"world\"\n}"
      }
    },
    {
      "role": "function",
      "content": "5",
      "name": "stringLength"
    },
    {
      "role": "assistant",
      "content": null,
      "function_call": {
        "name": "add",
        "arguments": "{\n  \"a\": 5,\n  \"b\": 5\n}"
      }
    },
    {
      "role": "function",
      "content": "10",
      "name": "add"
    },
    {
      "role": "assistant",
      "content": null,
      "function_call": {
        "name": "sqrt",
        "arguments": "{\n  \"x\": 10\n}"
      }
    },
    {
      "role": "function",
      "content": "3.1622776601683795",
      "name": "sqrt"
    }
  ],
  "temperature": 1.0,
  "top_p": 1.0,
  "presence_penalty": 0.0,
  "frequency_penalty": 0.0,
  "functions": [
    {
      "name": "stringLength",
      "description": "Calculates the length of a string",
      "parameters": {
        "type": "object",
        "properties": {
          "s": {
            "type": "string"
          }
        },
        "required": [
          "s"
        ]
      }
    },
    {
      "name": "add",
      "description": "Calculates the sum of two numbers",
      "parameters": {
        "type": "object",
        "properties": {
          "a": {
            "type": "integer"
          },
          "b": {
            "type": "integer"
          }
        },
        "required": [
          "a",
          "b"
        ]
      }
    },
    {
      "name": "sqrt",
      "description": "Calculates the square root of a number",
      "parameters": {
        "type": "object",
        "properties": {
          "x": {
            "type": "integer"
          }
        },
        "required": [
          "x"
        ]
      }
    }
  ]
}

Once again, it’s very important to note that entire chat history has been sent by the extension to OpenAI. Using this entire chat history, the LLM is finally able to answer the original user’s question:

{
  "id": "chatcmpl-8L3utMpnSKtgzWcmHJVwVJd36l9fk",
  "object": "chat.completion",
  "created": 1700030635,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The square root of the sum of the numbers of letters in the words \"hello\" and \"world\" is approximately 3.162."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 224,
    "completion_tokens": 29,
    "total_tokens": 253
  }
}

As seen above, the answer is The square root of the sum of the numbers of letters in the words \"hello\" and \"world\" is approximately 3.162. and that is what Quarkus will send back to the user as the HTTP response.