Using web search

Quarkus LangChain4j currently supports the Tavily search engine. To use it, add the quarkus-langchain4j-tavily extension to your project. You’ll need to specify the API key, this is done by the quarkus.langchain4j.tavily.api-key property.

After this, you can inject the search engine into your application using

@Inject
WebSearchEngine engine;

and then use it by calling its search method.

If you want to let an chat model use web search by itself, there are generally two recommended ways to accomplish this: either by implementing a tool that uses it, or as a content retriever inside a RAG pipeline. The chatbot-web-search example in the quarkus-langchain4j repository demonstrates using web search as a tool.

Using Web search as a tool

To use web search as a tool that the LLM can decide to execute (and the relevant search results will be the return value of the tool execution), you can either use the provided tool from the upstream LangChain4j project, in class dev.langchain4j.web.search.WebSearchTool, or implement your own tool if that one does not fit your requirements. The samples/chatbot-web-search example demonstrates how to use the provided tool.

Using Web search in a RAG pipeline

There is also a provided content retriever, dev.langchain4j.rag.content.retriever.WebSearchContentRetriever that uses a web search engine to retrieve relevant documents. For inspiration, the retrieval augmentor that wraps it may look like this:

@ApplicationScoped
public class WebSearchRetrievalAugmentor implements Supplier<RetrievalAugmentor> {

    @Inject
    WebSearchEngine webSearchEngine;

    @Inject
    ChatLanguageModel chatModel;

    @Override
    public RetrievalAugmentor get() {
        return DefaultRetrievalAugmentor.builder()
                .queryTransformer((question) -> {
                    // before actually querying the engine, we need to transform the
                    // user's question into a suitable search query
                    String query = chatModel.generate("Transform the user's question into a suitable query for the " +
                            "Tavily search engine. The query should yield the results relevant to answering the user's question." +
                            "User's question: " + question.text());
                    return Collections.singleton(Query.from(query));
                }).contentRetriever(new WebSearchContentRetriever(webSearchEngine, 10))
                .build();
    }
}

Tavily configuration reference

Configuration property fixed at build time - All other configuration properties are overridable at runtime

Configuration property

Type

Default

Base URL of the Tavily API

Environment variable: QUARKUS_LANGCHAIN4J_TAVILY_BASE_URL

string

https://api.tavily.com

API key for the Tavily API

Environment variable: QUARKUS_LANGCHAIN4J_TAVILY_API_KEY

string

required

Maximum number of results to return

Environment variable: QUARKUS_LANGCHAIN4J_TAVILY_MAX_RESULTS

int

5

The timeout duration for Tavily requests.

Environment variable: QUARKUS_LANGCHAIN4J_TAVILY_TIMEOUT

Duration

60S

Whether requests to Tavily should be logged

Environment variable: QUARKUS_LANGCHAIN4J_TAVILY_LOG_REQUESTS

boolean

false

Whether responses from Tavily should be logged

Environment variable: QUARKUS_LANGCHAIN4J_TAVILY_LOG_RESPONSES

boolean

false

The search depth to use. This can be "basic" or "advanced". Basic is the default.

Environment variable: QUARKUS_LANGCHAIN4J_TAVILY_SEARCH_DEPTH

basic, advanced

basic

Include a short answer to original query. Default is false.

Environment variable: QUARKUS_LANGCHAIN4J_TAVILY_INCLUDE_ANSWER

boolean

false

Include the cleaned and parsed HTML content of each search result. Default is false.

Environment variable: QUARKUS_LANGCHAIN4J_TAVILY_INCLUDE_RAW_CONTENT

boolean

false

A list of domains to specifically include in the search results. Default is [], which includes all domains.

Environment variable: QUARKUS_LANGCHAIN4J_TAVILY_INCLUDE_DOMAINS

list of string

[]

A list of domains to specifically exclude from the search results. Default is [], which doesn’t exclude any domains.

Environment variable: QUARKUS_LANGCHAIN4J_TAVILY_EXCLUDE_DOMAINS

list of string

[]

About the Duration format

To write duration values, use the standard java.time.Duration format. See the Duration#parse() Java API documentation for more information.

You can also use a simplified format, starting with a number:

  • If the value is only a number, it represents time in seconds.

  • If the value is a number followed by ms, it represents time in milliseconds.

In other cases, the simplified format is translated to the java.time.Duration format for parsing:

  • If the value is a number followed by h, m, or s, it is prefixed with PT.

  • If the value is a number followed by d, it is prefixed with P.