DEV Community

Cover image for Chat with your SQL database
selvakumar palanisamy
selvakumar palanisamy

Posted on

Chat with your SQL database

Creating complex SQL queries can be difficult when managing databases. Especially for people who are not SQL professionals, this might be challenging. It is obvious that there is a need for an approachable solution that makes creating SQL queries easier.

Existing techniques for creating SQL queries can be time-consuming and frequently necessitate a thorough grasp of the underlying database structure. Certain tools could be useful for creating queries, but they might require additional customisation to fit different databases or ensure security and privacy.

A helpful open-source Python framework called Vanna AI takes a two-step approach to simplifying SQL generation. First, it trains a Retrieval-Augmented Generation (RAG) model on your data, and then it asks questions to generate SQL queries that are specific to your database.

Vanna provides a simple and flexible solution to the prevalent problem of SQL query generating. Vanna makes database querying easier to use and more accessible for everyone.

How Vanna Works

Vanna is a Python module that helps you create precise SQL queries for your database by leveraging LLMs through retrieval augmentation.

Image description

Vanna works in two easy steps - train a RAG "model" on your data, and then ask questions which will return SQL queries that can be set up to automatically run on your database.
vn.train(...)

Train a RAG "model" on your data. These methods add to the reference corpus below.
vn.ask(...)

Ask questions. This will use the reference corpus to generate SQL queries that can be run on your database.
Install and Import Vanna

%pip install vanna
Enter fullscreen mode Exit fullscreen mode

Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from Jinja2>=3.0->flask->vanna) (2.1.5)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas->vanna) (1.16.0)
Installing collected packages: kaleido, vanna
Successfully installed kaleido-0.2.1 vanna-0.1.1

import vanna
from vanna.remote import VannaDefault
Enter fullscreen mode Exit fullscreen mode

Log In to Vanna

Vanna provides a function to get an API key. You'll get a code sent to your e-mail. You can save your API key for future usage so that you don't have to log in every time.

api_key = vanna.get_api_key('xxxxxyyyy@gmail.com')

Check your email for the code and enter it here:

Set Model

chinook is a public model that refers to the Chinook sample database

vanna_model_name = 'chinook' # This is the name of the RAG model. This is typically associated with a specific dataset.
vn = VannaDefault(model=vanna_model_name, api_key=api_key)
Enter fullscreen mode Exit fullscreen mode

Connect to the Database

Here we're connecting to a SQLite database but you can connect to any SQL database.

vn.connect_to_sqlite('https://vanna.ai/Chinook.sqlite')
Enter fullscreen mode Exit fullscreen mode

Ask Questions

Now we're going to use vn.ask to ask questions and it'll generate SQL, run the SQL, show the table, and generate a chart

vn.ask("What are the top 5 artists by sales?")
Enter fullscreen mode Exit fullscreen mode

SELECT a.ArtistId, a.Name, SUM(il.Quantity) AS TotalSales
FROM Artist a
JOIN Album al ON a.ArtistId = al.ArtistId
JOIN Track t ON al.AlbumId = t.AlbumId
JOIN InvoiceLine il ON t.TrackId = il.TrackId
GROUP BY a.ArtistId, a.Name
ORDER BY TotalSales DESC
LIMIT 5;
ArtistId Name TotalSales
0 90 Iron Maiden 140
1 150 U2 107
2 50 Metallica 91
3 22 Led Zeppelin 87
4 113 Os Paralamas Do Sucesso 45

Image description

Launch the User Interface

from vanna.flask import VannaFlaskApp
app = VannaFlaskApp(vn)
app.run()

Enter fullscreen mode Exit fullscreen mode

How to connect to Athena and redshift

*Athena *

import pandas as pd
from pyathena import connect

conn_details = {...}  # fill this with your connection details
conn_athena = connect(**conn_details)

def run_sql_athena(sql: str) -> pd.DataFrame:
    df = pd.read_sql(sql, conn_athena)
    return df

vn.run_sql = run_sql_athena
Enter fullscreen mode Exit fullscreen mode

Redshift

import pandas as pd
import psycopg2

conn_details = {...}  # fill this with your connection details
conn_redshift = psycopg2.connect(**conn_details)

def run_sql_redshift(sql: str) -> pd.DataFrame:
    df = pd.read_sql_query(sql, conn_redshift)
    return df

vn.run_sql = run_sql_redshift
Enter fullscreen mode Exit fullscreen mode

Most of Vanna’s functionality does not need your specific data, but does need your schema (eg SQL queries and structure — table names, column names)

It’s completely open source and free, and works with Snowflake, Postgres, Redshift, and most other databases.

Reference : vanna.ai.

Top comments (0)