DEV Community

Cover image for GPT Models Can Do More Than Just Talk: How to Make Them Brew You Coffee ☕
Raja Osama
Raja Osama

Posted on

GPT Models Can Do More Than Just Talk: How to Make Them Brew You Coffee ☕

Intent Classification 🎯

Intent classification is the task of figuring out what the user wants to achieve or do from their natural language input. For example, if the user says "Can you make me a coffee?", the intent could be "Make a coffee". This way, we can map the user's input to a specific action or function that we want our GPT model to perform.

To do this, I used Ada Models, which I wrote about in a previous article. Ada Models are models that return the embedding of the text, which is a numerical representation of its meaning. The embedding captures the semantic and syntactic features of the text, such as the words, phrases, and context.

By using a simple formula, we can find the most relatable of the embedding among a set of predefined categories, which indicates the most likely intent. The formula is based on the Euclidean distance, which measures how close two vectors are in a multidimensional space. The smaller the distance, the more similar the vectors are.

1 d23UD OMKKiv2qpsHDL1Rw (1)

For example, if we have three categories: "Make a coffee", "Make a tea", and "Make a sandwich", and we have an embedding for each category, we can compare the embedding of the user's input with each category embedding and find the one with the smallest distance. If the user says "Can you brew me a coffee?", the embedding of their input will be closer to the embedding of "Make a coffee" than to the other two categories, so we can infer that their intent is "Make a coffee".

A sample of how we can find the intent can be found by categorizing the intents into actions. So for example, if you have an action like Make a coffee, you can define it as a category with a user-provided prompt and a function to execute:

Make a coffee => (User Provided Prompt = "Can you make me a coffee?") => MakeACoffee()
Enter fullscreen mode Exit fullscreen mode

However, to increase the accuracy and flexibility of the system, we don't just define one category for each action, but more different ways of expressing the same intent. For example:

Make me a coffee => (User Provided Prompt = "Can you make me a coffee?") => MakeACoffee()\
Bring me a coffee => (User Provided Prompt = "Can you bring me a coffee?") => MakeACoffee()\
Buy me a coffee => (User Provided Prompt = "Can you buy me a coffee?") => MakeACoffee()
Enter fullscreen mode Exit fullscreen mode

As you can see, all these categories will trigger the same function: MakeACoffee().

Rule-Based System 📜

The rule-based system is like any other chatbot system, but ours will be very dynamic and powerful and use AI from start to end. The concept is to make a list of categories as keys and objects as values:

const intents = {
  categories: {
    "Make a coffee": MakeACoffee,
    "Make me a coffee": MakeACoffee,
    "Bring me a coffee": MakeACoffee,
    "Buy me a coffee": MakeACoffee,
  },
};
Enter fullscreen mode Exit fullscreen mode

Each object has an action, which can be either an API call or something else, a URL to call if it's an API call and a response function to handle the result of the call. For example:

const MakeACoffee = {
 action: "LOCAL_ACTION",
 response : (response) => 'Your coffee is here!'
};
Enter fullscreen mode Exit fullscreen mode

A function would make a fetch request and return the response. You can customize the response as much as you like.

Now whenever you ask for, Make me a coffee Your result would be your coffee is here. 

But what if you have follow-up questions or options? How would you handle that?

For example, when you ask the system to makeACoffee, you might want to handle what kind of coffee use case. To do that, we will have a recursive approach.

Each object can also have an ask function to generate a follow-up question and a categories object to define the possible options for the user. For example:

const MakeACoffee = {
  action: "API_CALL",
  url: "http://localhost:3000/coffees",
  ask: [
    (coffees) => {
      return `What kind of coffee would you like ${coffees.map(
        (coffee) => coffee
      )}?`;
    },
  ],
  categories: () => ({
    Mocha: MochaCoffee,
    Karak: KarakCoffee,
    Espresso: EspressoCoffee,
  }),
};
Enter fullscreen mode Exit fullscreen mode

Now with this approach, we will have a list of coffees in the response with a follow-up question and another object to handle the user's choice.

const MochaCoffee = {
  action: "LOCAL_ACTION",
  response: (response) => "Your mocha coffee is here!",
};

Enter fullscreen mode Exit fullscreen mode

And that's it! You have successfully integrated your GPT model with your own data sources and actions. You can now enjoy your coffee while chatting with your AI friend. ☕

Behind the Scenes 🕵️‍♂️

Now let's look at the behind the scene working of this application. I showed you the configuration, but let's look at the internals.

The second part of this article is here : GPT Models Can Do More Than Just Talk: How to Make Them Brew You Coffee

Top comments (0)