DEV Community

Romi Mendez
Romi Mendez

Posted on

SQL Assistant: Text-to-SQL Application in Streamlit 🤖

In this article, we will explore the application of, a 🐍 Python library specifically designed for training a model capable of processing natural language questions and generating SQL queries as responses. This implementation will be integrated into a ✨Streamlit application, creating a 🤖chatbot that facilitates posing questions and provides explanations for the returned queries.

🤖 Text-to-SQL &

Text-to-SQL is a tool that utilizes models to translate natural language queries into SQL queries, aiming to make it easy for users to generate SQL queries and interact with databases seamlessly.

The implementation of Text-to-SQL can be achieved through the use of Vanna.AI, an open-source 🐍Python library that allows the training of an RAG model with queries, DDL, and documentation from a database.

Image description

Vanna.AI: Output Possibilities

The following are the possible outputs that can be generated with Vanna methods, facilitating diverse ways of interacting with the database using natural language.

Output Description
📄 SQL Vanna.AI can generate SQL queries from natural language questions. These SQL queries can be used to interact with a database.
📁 DataFrame After generating an SQL query, Vanna.AI can execute it in the database and return the results as a pandas DataFrame.
📊 Charts Vanna.AI can generate plots using Plotly from the results of the SQL query.
❓ Follow-up questions Vanna.AI can generate follow-up questions based on the generated SQL query. These follow-up questions can help users refine their queries or explore the data in more detail.
🔍 Explanations queries Vanna.AI can provide explanations for the generated SQL queries. These explanations can help users understand how their natural language question is interpreted into an SQL query.

Vanna.AI: Features

The following table contains the key features available with Vanna.AI, enriching data interaction capabilities:

Feature Description
🚀 Model Training Train the RAG model on data sources, including Data Definition Language (DDL) statements, documentation, and SQL queries during the training process.
🤖 User Query Handling Users can pose questions in natural language, and Vanna.AI responds by generating SQL queries.
📚 Documentation Extensive documentation, a dedicated website, and a support community on Discord are available to offer comprehensive assistance.
🔌 Database Connections Vanna allows connecting to multiple databases, enabling users not only to retrieve SQL queries but also to execute them by establishing connections to their respective databases.
🤔 AI-Generated Recommendation Questions This framework includes a feature of generating AI-driven questions, offering suggestions for additional queries that the user could explore.

🛠️ Simple Implementation

For this initial implementation, we will leverage an example provided by, using the Chinook database. This database includes tables and data relevant to a music store, encompassing information about artists, albums, tracks, customers, orders, and various aspects associated with the management of an online music store.

Image description

Install vanna-ai

!pip install vanna

Implementation Steps

Follow these steps to implement a straightforward example of text-to-SQL:

  1. API Key Retrieval: To initiate this example, acquire an API key by registering at Utilize your registered email to obtain the key.

  2. Setting API Key and Model: Configure the API key obtained and specify the model to be used, in this instance, the pre-existing "chinook" model

  3. Connecting to the Database: Establish a connection with the database, utilizing an SQLite file available at

  4. Asking a Question: Finally, pose a natural language question to extract specific information from the database. In this step, the result includes the SQL query, the DataFrame from the query execution, and a representative chart. The system also generates follow-up questions based on the dataset.

import vanna as vn

# STEP 01: This is a simple example of how to use the Vanna API
api_key = vn.get_api_key('your_email')

# Set the API key and the model

# STEP 02:  Set the model

# STEP 03:  Connect with the database

# STEP 04:  Ask a question
vn.ask('What are the top 10 artists by sales?')

Enter fullscreen mode Exit fullscreen mode

📃 sql query

       sum(il.quantity) as totalsales
FROM   artist a
    INNER JOIN album al
        ON a.artistid = al.artistid
    INNER JOIN track t
        ON al.albumid = t.albumid
    INNER JOIN invoiceline il
        ON t.trackid = il.trackid
ORDER BY totalsales desc limit 10;
Enter fullscreen mode Exit fullscreen mode

📊 plotly chart
Image description

❓ New Questions

AI-generated follow-up questions:

* Who is the artist with the highest sales?
* What are the total sales for each artist?
* Which genre has the highest sales?
* Can you provide a breakdown of sales by album?
* Which artist has the lowest sales?
* What is the average sales per artist?
* Can you provide a list of the top-selling albums?
* Which genre has the least sales?
* Can you provide a breakdown of sales by country?
* What is the total sales for each genre?
Enter fullscreen mode Exit fullscreen mode

🧪 Model Training

To train your own model follow the following steps:

  1. Log in to your account and create a new model.
  2. Next we will define how to train the model. In our next example we will use ddl (data definition language), documentation and queries..
# Check the models available in the account
Enter fullscreen mode Exit fullscreen mode
['ecommerce-test', 'demo-tpc-h', 'tpc', 'chinook', 'thelook']
Enter fullscreen mode Exit fullscreen mode
# Set the model
Enter fullscreen mode Exit fullscreen mode
# Get the ddl for training the model
# Train the model with the ddl
ddl = """
CREATE TABLE if not exists stage.customers(
    customer_id           INT NOT NULL PRIMARY KEY,
    email_address         VARCHAR(50) NOT NULL,
    name                  VARCHAR(50) NOT NULL,
    business_type_id      INT NOT NULL,
    site_code             VARCHAR(10) NOT NULL,
    archived              BOOLEAN NOT NULL,
    is_key_account        BOOLEAN NOT NULL,
    date_updated          TIMESTAMP NOT NULL,
    date_created          TIMESTAMP NOT NULL,
    job_created_user  varchar(50) default null,
    job_updated_date  TIMESTAMP default null,
    job_updated_user  varchar(50) default null,
    CONSTRAINT fk_business_type_id FOREIGN KEY(business_type_id) REFERENCES stage.business_types (business_type_id)

Enter fullscreen mode Exit fullscreen mode

In my repository, you can find all the scripts, documentation, and queries to train the model and answer questions such as the following

# Ask a question for generating the SQL
question  =  """What is the total count of new clients who registered between October 1, 2020, and 
January 1, 2022, and have made more than 10 purchases, each exceeding $20? Additionally,
 could you provide their email addresses, the number of purchases made, and the date of their
  most recent purchase?"""

Enter fullscreen mode Exit fullscreen mode
SELECT COUNT(*) AS total_count,
       COUNT(o.order_id) AS num_purchases,
       MAX(o.order_date) AS most_recent_purchase_date
FROM Customers c
JOIN Orders o ON c.customer_id = o.customer_id
WHERE c.registration_date >= '2020-10-01' AND c.registration_date <= '2022-01-01'
  AND o.order_value > 20
GROUP BY c.email_address
HAVING COUNT(o.order_id) > 10;
Enter fullscreen mode Exit fullscreen mode

💬 Streamlit Chatbot

Image description


In this section, we will implement a 🤖chatbot application using text-to-SQL capabilities with ✨Streamlit. This app will be developed through the integration of Vanna.AI and ✨Streamlit, providing a user-friendly interface for entering your username, selecting an avatar, and initiating a 💬chat.

Quick Start

  1. Clone the Repository

    GitHub logo r0mymendez / text-to-sql

    Text-to-sql with vanna-ai and streamlit

    SQL Assistant: Text-to-SQL Application in Streamlit 🤖

    ✨ What is Generative AI?

    Generative AI refers to a category of artificial intelligence systems that have the capability to generate new and original content, such as images, text, or other forms of data.

    🧪 What is LLMs Models?

    Large Language Models (LLMs) are a complex deep learning models trained on extensive datasets to comprehend and generate natural language text. These models utilize a transformer architecture, incorporating an encoder to extract nuanced meanings from input text sequences and a decoder to understand intricate relationships between words and phrases. The training process involves millions of data inputs, foundational for creating millions of hyperparameters that fine-tune the model's behavior. The significant computational demands during training are efficiently handled by Graphics Processing Units (GPUs), enabling accelerated processing and enhancing the model's proficiency in generating contextually rich and meaningful text.

    🤖 Text-to-SQL &

    Text-to-SQL is a…

  2. Add your ddl scripts, documentation and sql queries in src\db</code>
  3. Add your credentials in src.streamlit\secrets.toml
  4. Execute the Application

Detailed instructions on how to run the application and add credentials can be found in the repository's

🤖 Chatbot Preview

The application, crafted with Vanna.AI and ✨Streamlit, and you can see below a video of how it works, remember that all the explanations are in the file of the repository.
Feel free to check it out and give it a star if you find it helpful! ⭐️

Image description

Image description

Image description

📚 References

If you want to learn...

⚠️ Disclaimer: Before proceeding with the use of your own data in the Vanna.AI application, it is recommended to review and validate the data protection policies established by Vanna.AI. Ensure a thorough understanding of how your personal data is handled and protected within the application framework. It is crucial to comply with privacy and security regulations before providing any sensitive information.

Top comments (0)