Patrick Chan

Posted on Dec 16 • Originally published at gentoro.com

User-Aligned Functions to Improve LLM-to-API Function-Calling Accuracy

#functioncalling #llm #api #rag

What is Function-Calling?

Large language models (LLMs) excel at generating text-based responses, providing static information, and assisting with tasks such as answering queries, drafting documents, summarizing content, and translation. While these capabilities are highly useful, the ability to interact with external systems significantly enhances the utility of LLMs. Function-calling is a transformative feature that enables LLMs to connect with external systems, APIs, and tools, allowing them to perform actions, retrieve real-time data, and interact with various systems.

Function-calling works well for straightforward tasks such as web searches or obtaining current weather information. However, integrating function-calling with enterprise systems through an API is significantly more challenging. This article will highlight some of the issues you may encounter when attempting to make these connections work effectively.

The Mechanics of Function-Calling

An LLM operates like a brain without a body, so how does it “invoke functions” without any direct connection to the real world? It does so indirectly. Essentially, the LLM is first asked which functions to invoke, and then those functions are invoked on its behalf. Afterward, the results from these invocations are provided to the LLM, and it is requested to formulate a final response. Here is a more detailed explanation of the steps involved:

Function Definition: All functions available to the LLM must be clearly defined in a way the LLM can understand and saved in a repository of available functions.
Function Identification (RAG): Upon receiving a prompt, the orchestrating platform must determine which of the available functions are relevant and then add the definitions of these functions to the original prompt for the LLM to use in the next step.
Parameter Mapping: The LLM must understand the context of the prompt, extract the relevant information, and map this information to the function parameters.
Function Invocation: The LLM will respond with a list of functions to invoke. The orchestrating platform must then invoke these functions and return the results to the LLM for final processing.

Steps 2 and 3 are particularly critical for accuracy. In Step 2, there could be hundreds of available functions, making it impractical to append all their definitions to the LLM for consideration due to prompt size limits and cost concerns. This step uses the popular RAG (Retrieval-Augmented Generation) system to find the ideal set of functions to send to the LLM. The issues discussed below affect how well the RAG system identifies the appropriate function.

Step 3 involves the LLM parsing the prompt and converting it into a function invocation request. Better training improves the LLM’s accuracy in this conversion. However, the challenges outlined below can affect the LLM’s ability to perform this task accurately.

Since both RAG and the LLM are involved in accuracy, changing the RAG system, even if the same LLM is used, could lead to different results.

Function-Calling Actors

In a function-calling system, four main actors play essential roles:

End-User: The individual who initiates a request via a prompt. They interact with the system using natural language queries and commands.
API (Application Programming Interface): The gateway to an external system. It exposes many of the external system’s capabilities, which can be invoked through various functions supported by the API.
LLM (Large Language Model): The AI model that processes the end-user’s natural language input, identifies the necessary actions, and formulates a request to the bridge.
Bridge: The intermediary that invokes API functions on behalf of the LLM.

Function-Calling Vs Tools

The term “tool” is becoming synonymous with “function-calling,” as both enable LLMs to interact with external systems. OpenAI initially introduced this capability under the name “function-calling,” and other vendors have since adopted the same terminology. Meanwhile, an emerging concept called AI agents, designed to perform tasks autonomously by interacting with external entities, use the term “tool” to represent these connections.

Interestingly, in OpenAI’s latest implementation, functions are registered using a JSON field called “tools,” suggesting that this capability might eventually be universally referred to as “tools.”

In summary, while “function-calling” is currently more commonly used in the context of LLMs and “tool” is associated with AI agents, both terms refer to the same underlying capability of enabling interactions with external systems.

Issues Affecting Function-Calling Accuracy

The design quality and documentation of an API significantly influence the RAG system’s ability to identify relevant functions and the LLM’s capability to translate requests into function invocations accurately. Insufficient information can hinder even the most skilled developer from connecting a prompt to the appropriate API functions. Effective function-calling requires API documentation that any developer can understand and utilize without additional help. While public APIs generally meet this requirement due to their aim of being easily accessible, proprietary APIs often lack detailed documentation because of the substantial effort required and limited need for widespread developer access. Fortunately, AI can assist in this process. With some sample code, an LLM can infer the API’s usage and generate the necessary documentation. Nonetheless, addressing this deficiency is crucial before the API can be used in a function-calling context.

However, even with fully-documented APIs, other challenges remain in achieving accurate results. These challenges arise because APIs are designed for developers and applications, not for end-users. End users use higher-level concepts that don’t always match what the API supplies. APIs have to support multiple use cases and therefore they must support more general abstractions. When a developer implements an application that implements the end-user concepts using the API, they must determine how best to map those higher level concepts to the lower level abstractions supplied by the API. This is not a trivial task, and the LLM must accomplish all of this on the fly.

Here is a more detailed look at some of the issues the LLM must handle to accomplish this task.

Terminology Mismatch

The terms used by end users often do not align with those used by an API. For instance, marketing may rename a concept (e.g., “Repo” to “Data Source”) in the user interface (UI) and documentation to resolve a user confusion issue, while the API retains the original term to avoid breaking existing clients. If this mapping is not documented, neither the RAG system nor the LLM would be able to recognize that when the end user’s prompt mentions “data source,” it actually refers to “Repo” objects in the API.

A potential solution is to include this mapping in the documentation to assist both the LLM and developers. However, this may not always be feasible, especially if the API concept has different names across various clients.

Developer-Friendly but Inadequate for RAG/LLM

While API documentation might be comprehensive and clear for developers, it can fall short for the RAG system and LLM. Developers often rely on their broader knowledge of the API, understanding concepts that are not explicitly defined within the function documentation to avoid redundancy. However, this implicit knowledge needs to be explicitly included for the RAG system and LLM to make correct connections.

A common issue with today’s LLMs is that the quality of responses can vary based on factors such as phrasing and word order, which should be irrelevant. When an LLM performs better with a specific format that does not align with the documentation guidelines, a conflict arises: should the priority be the LLM’s performance or adherence to documentation standards?

Function Disambiguation

APIs can offer hundreds or even thousands of functions, whereas an application might only need a few. This abundance of functions can make finding the correct one difficult. Properly filtering and selecting the necessary functions for a specific task is crucial to avoid overwhelming the system and ensuring accurate function-calling. One straightforward solution is to only publish the needed functions and not register the rest.

While reducing the list of functions may improve accuracy, issues can still arise if the LLM does not select the correct function. Adding more information to the function descriptions could help the LLM disambiguate more effectively. A last resort would be to rename the functions to be more specific. For example, instead of using a generic function name like getData(), it could be renamed to getCustomerDataById().

Coordinating Multiple API Function Calls

Some user requests require multiple API function calls to complete. Coordinating these calls accurately can be challenging. For instance, a user request such as “Schedule a picnic on a sunny day” necessitates sequentially calling the checkWeatherForecast() and scheduleEvent() functions. This process involves ensuring that the weather forecast is checked first and, only if it predicts sunny weather, proceeding to schedule the picnic. Properly handling these sequences is essential to fulfill complex user requests efficiently and accurately.

However, some requests are more complex and cannot be handled by a simple sequence of calls but require conditions or loops. For example, consider a request to “Find the best flight deal under $500 for the next three weekends.” This scenario necessitates looping through flights for each weekend, applying conditions to filter out flights exceeding $500, and potentially making additional API calls to retrieve updated prices or alternative routes. Such tasks involve intricate logic that must be dynamically managed, demonstrating the sophistication required for accurate function-calling.

Although LLMs can assist developers in creating code to handle this complex logic with guidance and retries, expecting the LLM to derive this logic on the fly and consistently provide accurate results is unrealistic. The LLM will sometimes make errors, impacting performance.

Handling Additional Logic

Certain requests may require more complex logic beyond simple API calls, adding layers of complexity to the process. For example, if a user asks, “Find me the cheapest flight next month,” the LLM might need to call searchFlights() and then apply additional logic to filter and sort results by price. Normally, this additional logic would be handled by the application using the API. While the LLM might manage this logic without assistance, it may not always do so accurately or might not be able to handle it at all.

Parameter Value Mapping

Translating user-friendly terms into the specific values required by the API can be challenging for LLMs. Some API functions require specific keyword values, and the LLM must convert end-user concepts into these enums. For example, an API might define special strings for ethnicity such as “AA” for African American, “CA” for Caucasian, “HI” for Hispanic, and “AS” for Asian. The LLM must translate the end-user term in the prompt into one of these keywords. Although these conversions are straightforward and predictable with code, LLMs can occasionally err.

Another example is that a parameter might need some conversion to a completely different kind of value.. For example, an API might require a start and end date, but the user may have provided a start date and duration. The LLM must then convert the duration into an end date to fulfill the API’s requirements. Some LLMs may have difficulty with more complicated conversions.

Parameter Structure Complexity

APIs often feature complex nested structures with a few dozen parameters, many of which may not be necessary for a specific request. This design consolidates multiple use cases into a single function, simplifying development but complicating the parameter mapping process for an LLM. For example, an API function such as bookFlight(details) might require a detailed parameter with at least a couple dozen nested fields. If only the date and destination are needed, the LLM may struggle to identify the correct fields to set. Adding extensive documentation for these fields might further complicate the process, making it harder to locate the necessary parameters.

Unassigned Parameters

An API function might support multiple use cases, resulting in extra parameters that are not needed for the current use case. For example, a function may include a parameter for regions, but the current use case is limited to North America. This region information won’t be available in the prompt, making it impossible for the LLM to assign a value to the region parameter.

One solution is to annotate the API documentation to assign a default value to the parameter if it is not provided. This additional information would be included in the API documentation for the LLM but excluded from the general documentation. However, if the same function is used by different use cases requiring different default values, this approach would not be feasible. In such cases, new functions must be defined to support each use case. The bridge would host and invoke these functions, which would then call the target API function with the appropriate hardcoded value.

Function Output Format

Just as API inputs are designed for applications rather than end-users, function output is designed in the same way and therefore suffer from the same issues that affect parameter inputs. For example, an API might return values like “AA” for African American or “CA” for Caucasian. These codes are convenient for applications but require extra work for an LLM. Converting these codes to natural language is straightforward in code but can be unpredictable when done by an LLM. Even if the full list of codes is documented in the API, the LLM can occasionally miss a transformation.

Similarly, complex nested output structures can complicate the LLM’s ability to process and understand the data. Perhaps only a couple of fields need to be used among a few dozen, making it challenging for the LLM to disregard irrelevant fields and find the relevant ones. It may also struggle to disambiguate similar-looking fields.

Error Handling

Handling errors gracefully is a crucial aspect of API integration. When an API call fails or returns an unexpected result, the LLM must be able to interpret the error message and return an appropriate response. In applications, code is often written to interpret error codes and act accordingly, such as displaying an alternate screen if a resource is not found. The appropriate response to the end user must be documented for the LLM. This can be managed by adding this information to the API documentation with annotations that exclude it from the published documentation but include it when sent to the LLM.

However, the actual response may vary based on the use case, making it difficult to encode the correct behavior in the documentation. In such cases, creating new functions that map API errors to user-friendly messages tailored to specific use cases may be necessary. This approach ensures that the LLM can provide meaningful and accurate feedback to users, enhancing the overall user experience and reliability of the bridge.

A Solution: User-Aligned Functions

Some of the issues above can be addressed with improved API documentation. However, if these cannot be resolved through documentation alone, another approach is required. This approach involves introducing a new layer of functions specifically designed to facilitate LLM interaction without having to deal with the API’s design. We call these functions user-aligned functions (UAFs), aligned with the needs and terminology of end users rather than the API. In other words, rather than have the LLM try to deal with the gap between the end user concepts and the API concepts, we altogether eliminate this gap. Instead, the problem of bridging these two worlds is relegated to something much better suited for this kind of work, namely code. This approach involves two key phases:

Phase 1: User-Aligned Function Definition

In this phase, UAFs are designed based on the kind of prompts that will be handled. For example, if the prompt is “Find the nearest coffee shop,” a possible UAF could be findNearestLocation(category, userLocation). This function would be optimized for LLM usage with names, parameters, output formats, and documentation tailored for seamless interaction. This optimization ensures consistency and accuracy without requiring changes to the original API.

Phase 2: Implementation

Once the UAFs are defined, the next step is implementation. This involves creating code that will use the UAF to carry out various operations available in the API. AI can potentially generate this code by scanning the UAF and API documentation. It might take a few iterations to perfect, but once completed, the code will be entirely predictable, unlike the LLM.

Conclusion

As LLMs become more powerful, many of the challenges listed above may be mitigated, though they may not completely disappear. Function-calling in LLMs is a powerful feature that facilitates interaction with external systems through APIs. However, achieving accuracy in function-calling is fraught with challenges, from domain mismatches and documentation gaps to terminology differences. Introducing User-Aligned Functions (UAFs) can address these issues effectively by bridging the gap between end-user concepts and API requirements. Addressing these challenges is crucial for leveraging the full potential of LLMs in various applications.

DEV Community

User-Aligned Functions to Improve LLM-to-API Function-Calling Accuracy

What is Function-Calling?

The Mechanics of Function-Calling

Function-Calling Actors

Function-Calling Vs Tools

Issues Affecting Function-Calling Accuracy

A Solution: User-Aligned Functions

Conclusion

Top comments (0)

Read next

Day 45: Interpretability Techniques for LLMs

Express 5 is here, what’s new?

Connect to multiple databases, make or generate SQL queries, analyze or visualize.

Generative AI in Video: Transforming Content Creation