DEV Community

Cover image for Chat with your CSV: leveraging PandasAI, OpenAI, and AgentLabs
Kevin Piacentini
Kevin Piacentini

Posted on

Chat with your CSV: leveraging PandasAI, OpenAI, and AgentLabs

Large language models (LLMs) have seen a remarkable surge in their capabilities, demonstrating proficiency across a multitude of tasks, such as text generation, language translation, and answering inquiries.

PandasAI is a Python library that simplifies the utilization of Pandas by leveraging the power of LLMs.

In this article, we will delve into how PandasAI can be used for conducting basic CSV file analysis with natural language user requests.

What are we going to build?

We will use PandasAI in addition to AgentLabs to craft an intuitive and simple user interface without coding it.

Users will have the capability to upload CSV files and seek insights from the data by posing questions. The system will then generate responses and additionally facilitate the creation of tables and graphs for enhanced data comprehension.

Demo

Watch until the end for the charts ;)

Getting started

First let's install some dependencies we'll need in this tutorial.

If you're using pip:

pip install pandasai openai tabulate matplotlib agentlabs-sdk
Enter fullscreen mode Exit fullscreen mode

If you're using poetry:

poetry add pandasai openai tabulate matplotlib agentlabs-sdk
Enter fullscreen mode Exit fullscreen mode

Now let's import everything so we don't have to worry about it later:

from agentlabs.attachment import Attachment
from agentlabs.agent import Agent
from agentlabs.chat import IncomingChatMessage, MessageFormat
from agentlabs.project import Project
from typing import Any, Dict, List
from pandasai import SmartDataframe
from pandasai.llm import OpenAI
from pandasai.responses.response_parser import ResponseParser
import matplotlib
matplotlib.use('Agg')
import os
Enter fullscreen mode Exit fullscreen mode

No worries, I will explain everything during this tutorial.

Preparing PandasAI

To start using PandasAI we'll need to instantiate the LLM we want.

In this tutorial we're using OpenAI, but you can check for other LLMs in their documentation.

llm = OpenAI(api_token=os.environ['OPENAI_API_KEY'])
Enter fullscreen mode Exit fullscreen mode

| Here we consider our OPENAI_API_KEY is stored in our environment.

Then, we'll have to instantiate a SmartDataFrame. In our case, we'll import the data from a file which is locally stored.

In another tutorial, I will show you how you can simply handle any user-uploaded file with AgentLabs' built-in file upload.

smart_df = SmartDataframe("./netflix.csv", config={
  "llm": llm,
  "save_charts_path": "/tmp/charts",
  "save_charts": True,
  "verbose": True,
})
Enter fullscreen mode Exit fullscreen mode

As you can see, we pass our LLM as an argument, we also set some other options that will be helpful for later:

  • save_charts_path: this will indicate where the library will save the potential charts figures it generates for us. We use /tmp so it's removed at some point.
  • save_charts: indicate to the library we want to persist the charts figures.
  • verbose: self explanatory. It's useful to check what's going on sometimes.

Something is missing

At this stage, we instantiated everything we need to get PandasAI working.

We can start using our SmartDataFrame as follows:

result = smart_df.chat("What is the structure of the file?")
Enter fullscreen mode Exit fullscreen mode

But we still have two problems to solve:

  1. We want the input to be controlled by our users
  2. We want to handle multiple kind of responses (plot images, dataframes, etc.)

Let's solve those two problems now.

Preparing the frontend

We'll start by setting up the user interface with AgentLabs.
It's fairly easy to do:

  • sign-in to https://agentlabs.dev
  • create a project
  • create an agent and name it as you wish
  • create a secret key for this agent

Image description

Init the AgentLabs project

Now, we will init AgentLabs with the info they provide us in our dashboard.

alabs = Project(
  project_id="0fc95376-beeb-4532-a28d-8d2321dd4d0d",
  agentlabs_url="https://csv-analyser.app.agentlabs.dev",
  secret=os.environ['AGENTLABS_SECRET']
)

alabs.connect()
alabs.wait()
Enter fullscreen mode Exit fullscreen mode

| Here we add our secret in an environment variable for safety reasons. All the above variables can be found in your AgentLabs console.

React to user's messages

Your frontend is (almost) ready.

You can now open the URL of your AgentLabs project by clicking "Open my ChatUI".

Image description

Our plan to handle user's requests

What we want to do is :

  1. handle every message sent by the users and pass them to PandasAI.
  2. parse the PandasAI result and return it to the user
    • if the result is a plot image, we want to send an attachment.
    • if the result is a DataFrame we want to send a markdown + a CSV file as an attachment.
    • if the result is a string or anything else, we want to send it to the user as it is.

To handle users' messages, we'll use the on_chat_message method and pass our handler as an argument.

Here's how the final code looks like.

def handle_task(message: IncomingChatMessage):
  alabs_agent = alabs.agent("d2d5fc3e-d4f7-45c4-b0d2-264c9f6a1a78")

  result = smart_df.chat(message.text)

  attachments = []

  if isinstance(result, str):
    response = result

  elif result["type"] == "plot":
    attachments = [Attachment.from_local_file(result["value"])]
    response = "Here's the chart you asked for."

  elif result["type"] == "dataframe":
    df = result["value"]
    response = df.to_markdown()
    csv_path = "/tmp/my-csv.csv"
    df.to_csv(csv_path)
    attachments = [Attachment.from_local_file(csv_path)]

  else:
    response = result["value"]

  alabs_agent.send(
    conversation_id=message.conversation_id,
    text=response,
    format=MessageFormat.MARKDOWN,
    attachments=attachments
  )

alabs.on_chat_message(handle_task)
Enter fullscreen mode Exit fullscreen mode

The code above is pretty easy to understand but let's go over it for a second.

First, we instantiate our AgentLabs agent. This is through this agent that we'll be able to send messages back to the user:

def handle_task(message: IncomingChatMessage):
  alabs_agent = alabs.agent("d2d5fc3e-d4f7-45c4-b0d2-264c9f6a1a78")
Enter fullscreen mode Exit fullscreen mode

Then we pass the content of the message to PandasAI and we get a result.

result = smart_df.chat(message.text)
Enter fullscreen mode Exit fullscreen mode

Depending on the result "type" property. We define the format of our response and attachments.

For example here, if the type is a "plot", we assume the "value" is the path to the plot file, so we instantiate an AgentLabs attachment that contains the plot image.

  elif result["type"] == "plot":
    attachments = [Attachment.from_local_file(result["value"])]
    response = "Here's the chart you asked for."
Enter fullscreen mode Exit fullscreen mode

We do that for every type we want to handle.
Later, we send the response to our user using the send() method provided by AgentLabs.

You can see we attach a text message and every potential attachment.

Attachments will be displayed in the chat as images or files and will be downloadable by our users.

  alabs_agent.send(
    conversation_id=message.conversation_id,
    text=response,
    format=MessageFormat.MARKDOWN,
    attachments=attachments
  )
Enter fullscreen mode Exit fullscreen mode

We're mostly done!

Now we're mostly done. However, you're probably wondering yourself "Where do these type and values fields come from in the PandasAI response?"

Well, you're right. Those fields are not available by default.

So currently result = smart_df.chat(message.text) will not return what we expect.

By chance, this is a fairly simple problem to solve thanks to Custom Responses.

Extends the ResponseParser

To change the output format of our result = smart_df.chat(message.text), we can implement our version of the ResponseParser class provided by PandasAI.

This class has 3 methods:

  • format_dataframe: triggered when the result is a dataframe. - format_plot: triggered when the result is a plot
  • format_other: triggered when the result is either a dictionary, a string, or a number.

In our version, we just want to return the results as follows:

class CustomResponseParser(ResponseParser):
  def __init__(self, context) -> None:
      super().__init__(context)

  def format_other(self, result) -> dict:
    return result

  def format_dataframe(self, result: dict) -> dict:
      return {
        "type": "dataframe",
        "value": result["value"],
      }

  def format_plot(self, result: dict) -> dict:
    return {
      "type": "plot",
      "value": result["value"]
    }
Enter fullscreen mode Exit fullscreen mode

Then, we need to tell our SmartDataFrame to use our CustomResponseParser, using the response_parser option.

smart_df = SmartDataframe("./netflix.csv", config={
  "llm": llm,
  "save_charts_path": "/tmp/charts",
  "save_charts": True,
  "verbose": True,
  # ADDED THIS LINE:
  "response_parser": CustomResponseParser
})
Enter fullscreen mode Exit fullscreen mode

Et voilà!

Congrats, you've created your very own data assistant capable of parsing and manipulating csv files!

The full code of this tutorial is available here in replit.

And again, here's the result in action:

Conclusion

This is obviously experimental. You will probably face some situations where you would need to play a bit with PandasAI behavior to exactly fit your needs.

However, it's a pretty decent result obtained in no time.

As we mentioned, this example is not capable of handling file upload. This can be easily achieved with AgentLabs. Let me know if you would like to learn more about it :)

Thank you!

If you liked this tutorial, feel free to smash the like button and react in the comments below :)

Top comments (0)