DEV Community

Nadeesha Cabral for Inferable

Posted on • Originally published at inferable.ai on

Dynamic Tool Attachment for LLM Applications

By John Smith

November 25, 2024

Tool calling is a feature of many Large Language Models (LLMs) which allows them to interact with external systems through functions provided by the calling application. When an LLM needs to perform an action such as searching a database or calling an API it can "call" a tool by specifying the function name and parameters in a structured format. For example, if a user asks "What's the weather in Paris?", a LLM with access to a weather API tool might generate a response like:

{
  "function": {
    "name": "getWeather",
    "parameters": {
      "city": "Paris"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

The application can interpret this call and use the weather API to fetch the weather in Paris. In order for an LLM to invoke a tool, the tool's name, description, and parameters must be provided in the context of the conversation. For applications with a large number of tools, this can lead to a few challenges:

  1. Increased token usage as all tool descriptions are included in the context
  2. Higher likelihood of the model choosing the wrong tool

Dynamic Tool Attachment

One approach to combat the above issues is to dynamically attach tools to the LLM based on the user's input. For example, we can use a semantic search to dynamically attach only the most relevant tools based on the user's input.

Let's build a simple system that uses semantic search to dynamically attach tools to an LLM based on the user's input by:

  1. Embedding tool names and descriptions into an in-memory vector store using an embedding model
  2. Embedding the user's prompt using the same embedding model
  3. Retrieving the 5 most similar tools using a cosine similarity search
  4. Including the most relevant tools in the model's context with the user's prompt

Setting Up

For this project we will be using a TypeScript application and Ollama for local chat completion and embedding. If you don't have Ollama installed, you can follow the installation instructions to get started.

Downloading Models

Download the following models for local use:

Node Dependencies

This project uses a couple Node dependencies:

npm init
npm install compute-cosine-similarity tsx
Enter fullscreen mode Exit fullscreen mode

Defining Our Tools

We'll start by defining a set of dummy tools for our application. Each tool follows a consistent pattern with a name, description, and parameters:

export const ALL_TOOLS = [{
  type: "function",
  function: {
    name: "findCat",
    description: "Find the cat with the ID provided",
    parameters: {
      type: "object",
      properties: {
        id: {
          type: "string"
        },
      },
      required: ["id"],
    },
  },
},
.... ]
Enter fullscreen mode Exit fullscreen mode

You can find a pre-defined set of tool schemas in the accompanying project repository.

Chat completion

We will use llama3.2 (3b) for chat completion by calling the Ollama chat endpoint:

type Message = {
  role: "user" | "assistant";
  content: string;
}

type Tool = {
  type: string;
  function: {
    name: string;
    description: string;
    parameters: unknown;
  }
}

const ollamaChat = async (messages: Message[], tools: Tool[]) => {
  const response = await fetch("http://localhost:11434/api/chat", {
    method: "POST",
    headers: {
      "Content-Type": "application/json"
    },
    body: JSON.stringify({
      model: "llama3.2",
      messages,
      tools,
      stream: false,
    }),
  });
  const data = await response.json();
  return data.message;
}
Enter fullscreen mode Exit fullscreen mode

Computing Embeddings

We will use nomic-embed-text to generate vector embeddings by calling the Ollama embed endpoint:

const ollamaEmbed = async (input: string) => {
  const response = await fetch("http://localhost:11434/api/embed", {
    method: "POST",
    headers: {
      "Content-Type": "application/json"
    },
    body: JSON.stringify({
      model: "nomic-embed-text",
      input,
    }),
  });
  const data = await response.json();
  return data.embeddings[0];
}
Enter fullscreen mode Exit fullscreen mode

Embedding tools

Let's create a function to compute embeddings for all tools and return an array of objects with both the tool schema and the embedding. We will embed the tool name and description.

const computeToolEmbeddings = async (tools: Tool[]) => Promise.all(
  tools.map(async (tool) => {
    const embedding = await ollamaEmbed(`${tool.function.name}: ${tool.function?.description}`)
    return {
      tool,
      embedding,
    }
  })
)
Enter fullscreen mode Exit fullscreen mode

While this in memory data structure works for the purposes of this example, in a real-world application you would want to store embeddings in a database such as pgvector to avoid needing to re-compute the embeddings.

Tool Search Implementation

Let's implement a simple semantic search function using the compute-cosine-similarity package.

type EmbededTool = {
  tool: Tool;
  embedding: number[];
}

const searchTools = async (input: string, embeddings: EmbededTool[]) => {
  const messageEmbedding = await ollamaEmbed(input);
  return embeddings
    .map((embedding) => ({
      tool: embedding.tool,
      similarity: similarity(embedding.embedding, messageEmbedding)
    }))
    .sort((a, b) => b.similarity - a.similarity)
    .slice(0, 5);
}
Enter fullscreen mode Exit fullscreen mode

This function:

  1. Embeds the user's input
  2. Compares the user's embedding with all tool embeddings
  3. Returns the top 5 most relevant tools based on the similarity score (higher is more relevant)

Putting it all together

The last part of our project is a main function that:

  • Embeds all tools using computeToolEmbeddings
  • Prompts the user for input using readline
  • Uses searchTools to compare the user's input with the tool embeddings
  • Calls ollamaChat with the user's input and the attached tools
const main = async () => {
  console.log("Embedding tools...");
  const embeddings = await computeToolEmbeddings(ALL_TOOLS);
  console.log("Tools embedded.");

  const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout
  });

  console.log("Enter your message:");
  let messages: Message[] = []

  rl.on('line', async (input) => {
    const attachedTools = await searchTools(input, embeddings);
    messages.push({ role: "user", content: input })
    const response = await ollamaChat(messages, attachedTools);
    messages.push({ role: "assistant", content: response.content })
    console.log("Tools called:", response.content, response.tool_calls)
  })
}

main();
Enter fullscreen mode Exit fullscreen mode

We can now run the project with tsx index.ts and test it out:

If we use the prompt find tool with id 123, our searchTools function will return the 5 most relevant tools. We can see that findTool has the highest similarity score.

Enter your message: find tool with ID 123
Attaching tool: { tool: 'findTool', similarity: 0.825733057876305 }
Attaching tool: { tool: 'findToy', similarity: 0.6924025554970754 }
Attaching tool: { tool: 'findCar', similarity: 0.6859769197248136 }
Attaching tool: { tool: 'findBook', similarity: 0.6778616325470412 }
Attaching tool: { tool: 'findSong', similarity: 0.6710841959852032 }
Tools called: [{ function: { name: 'findTool', arguments: [Object] } } ]
Enter fullscreen mode Exit fullscreen mode

We can also prompt with similar terms, such as find hammer with ID 123. Even though no tool explicitly matches the term "hammer" our similarity search function still returns findTool as the most relevant result.

Enter your message: find hammer with ID 123
Attaching tool: { tool: 'findTool', similarity: 0.5955042990419085 }
Attaching tool: { tool: 'findToy', similarity: 0.5411030864968827 }
Attaching tool: { tool: 'findMovie', similarity: 0.5219368182792513 }
Attaching tool: { tool: 'findSong', similarity: 0.5155495695032712 }
Attaching tool: { tool: 'findCar', similarity: 0.5132604044773412 }
Tools called: [{ function: { name: 'findTool', arguments: [Object] } } ]
Enter fullscreen mode Exit fullscreen mode

Source Code

Source code for this post is available in the accompanying inferablehq/ollama-dynamic-tools Github project.

Top comments (0)