DEV Community

Cover image for Make the OpenAI Function Calling Work Better and Cheaper with a Two-Step Function Call šŸš€
KrisztiƔn Maurer
KrisztiƔn Maurer

Posted on

Make the OpenAI Function Calling Work Better and Cheaper with a Two-Step Function Call šŸš€

I tried using OpenAI's feature for running local functions in a project with many functions. It worked well, even with lots of functions, but the cost of using OpenAI's API increased significantly. This happened because when we send function details (in JSON format) along with our main request, it counts as part of our input tokens, making it more expensive. The problem is, we often send more functions than necessary, even though the AI only needs a few of them to respond to our request.

So, I had an idea to save money and improve performace: What if we only send the details of the functions the AI actually needs? Here's how it works: First, we send a request with our main question or task, including a list of all the functions we could use, but we don't send the detailed instructions for those functions yet. We just give a brief description of each. Then, the AI tells us exactly which functions it needs to answer our question or complete our task. After that, we send another request with the detailed instructions only for those needed functions.

This method can significantly lower the cost per message because we only send the details for the functions that are necessary.

From another point of view, this method also makes the AI work faster and better. When we send too many detailed functions, we end up giving the AI too much information to handle at once. This can slow down its performance because it has to deal with a lot of extra details. However, if we only send the essential information that the AI needs, we help it stay focused and efficient. This way, we can even add more useful information without overloading it. For example, when using GPT-3 with many functions, we quickly hit the maximum amount of information it can consider at one time. By being selective about what we send, we avoid reaching this limit too soon.

Basic tool call example

Let's look at a basic example of how function calling works. We start by sending the AI a question or task along with a list of all function schemas it can use. If the AI needs more information or has to do a specific job to answer our question, it picks one of the functions we've given it to help find the answer.

Maurer Krisztian

When you use tool or function calling, you're essentially giving the AI model a way to 'call out' to an external function. This could be anything from performing a complex mathematical calculation, accessing a database for specific information, running a custom algorithm, or even interacting with web services. The function executes the task and returns the result to the AI, which then incorporates this information into its response.
https://platform.openai.com/docs/guides/function-calling

Two-Step Tool Call example

Maurer Krisztian two step tool call diagram

The process is simpler than it sounds, and here's a straightforward explanation:

Start with a Single Tool: We begin by using a special tool called the "tool descriptor." This tool takes one parameter, a list named neededTools, which specifies the tools that might be needed. You list all the available tools here.

Requesting Specific Tools: If the AI determines it needs certain tools to complete its task, it requests them through the "tool descriptor" by specifying which tools it needs from the neededTools list.

Providing the Requested Tools: Once the AI requests specific tools, we then supply these requested tools to the AI.

AI Uses the Tools: Now that the AI has the tools it specifically asked for, it can go ahead and process the request, using the tools as needed to come up with a final answer. Occasionally, during this process, the AI might realize it needs an additional tool it didn't request initially. If that happens, the process starts over, and we provide the newly requested tool.

Here's a simple way to look at it using an example: Imagine we have 100 tools available, but the AI only needs 2 to answer a question. Instead of sending all 100 tool descriptions upfront, we initially send just the "tool descriptor" request. Then, based on the AI's needs, we only provide the 2 necessary tools. By using just 3 tool JSON schemas instead of 100, we save resources and make things more efficient. This approach uses fewer tokens, which is cheaper, and it also boosts performance. Having too many details can actually make the AI less accurate.

Check out how this method works with this code example: https://github.com/MaurerKrisztian/two-step-llm-tool-call

Thanks for taking the time to read! I hope you found it helpful. If you're interested in seeing how to do this in Python, just let me know in the comments.

Top comments (3)

Collapse
 
adamdcosta profile image
AdamDCosta

Interesting. Do you have any other thoughts about optimizing prompts/calls to LLMs?

Collapse
 
maurerkrisztian profile image
KrisztiƔn Maurer

Hi, another idea is to use embeddings as a way to choose the right tool. This involves comparing the prompt with descriptions of the tools to find the best match, but I haven't tried it yet.
platform.openai.com/docs/guides/em...

Collapse
 
maurerkrisztian profile image
KrisztiƔn Maurer

This method needs some testing. I ran into a problem where the OpenAI API tried to call a function because it recognized in the tool descriptor. I fixed it with a system message: "You need to request tools using the tool_descriptor tool before you can use them. IMPORTANT: Don't call tools that don't have a JSON description schema." However, it would be great if OpenAI could fine-tune this method into a model. I might try to improve this and fine-tune it into an open-source language model.