Creating complex SQL queries can be difficult when managing databases. Especially for people who are not SQL professionals, this might be challenging. It is obvious that there is a need for an approachable solution that makes creating SQL queries easier.
Existing techniques for creating SQL queries can be time-consuming and frequently necessitate a thorough grasp of the underlying database structure. Certain tools could be useful for creating queries, but they might require additional customisation to fit different databases or ensure security and privacy.
A helpful open-source Python framework called Vanna AI takes a two-step approach to simplifying SQL generation. First, it trains a Retrieval-Augmented Generation (RAG) model on your data, and then it asks questions to generate SQL queries that are specific to your database.
Vanna provides a simple and flexible solution to the prevalent problem of SQL query generating. Vanna makes database querying easier to use and more accessible for everyone.
How Vanna Works
Vanna is a Python module that helps you create precise SQL queries for your database by leveraging LLMs through retrieval augmentation.
Vanna works in two easy steps - train a RAG "model" on your data, and then ask questions which will return SQL queries that can be set up to automatically run on your database.
vn.train(...)
Train a RAG "model" on your data. These methods add to the reference corpus below.
vn.ask(...)
Ask questions. This will use the reference corpus to generate SQL queries that can be run on your database.
Install and Import Vanna
%pip install vanna
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from Jinja2>=3.0->flask->vanna) (2.1.5)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas->vanna) (1.16.0)
Installing collected packages: kaleido, vanna
Successfully installed kaleido-0.2.1 vanna-0.1.1
import vanna
from vanna.remote import VannaDefault
Log In to Vanna
Vanna provides a function to get an API key. You'll get a code sent to your e-mail. You can save your API key for future usage so that you don't have to log in every time.
api_key = vanna.get_api_key('xxxxxyyyy@gmail.com')
Check your email for the code and enter it here:
Set Model
chinook is a public model that refers to the Chinook sample database
vanna_model_name = 'chinook' # This is the name of the RAG model. This is typically associated with a specific dataset.
vn = VannaDefault(model=vanna_model_name, api_key=api_key)
Connect to the Database
Here we're connecting to a SQLite database but you can connect to any SQL database.
vn.connect_to_sqlite('https://vanna.ai/Chinook.sqlite')
Ask Questions
Now we're going to use vn.ask to ask questions and it'll generate SQL, run the SQL, show the table, and generate a chart
vn.ask("What are the top 5 artists by sales?")
SELECT a.ArtistId, a.Name, SUM(il.Quantity) AS TotalSales
FROM Artist a
JOIN Album al ON a.ArtistId = al.ArtistId
JOIN Track t ON al.AlbumId = t.AlbumId
JOIN InvoiceLine il ON t.TrackId = il.TrackId
GROUP BY a.ArtistId, a.Name
ORDER BY TotalSales DESC
LIMIT 5;
ArtistId Name TotalSales
0 90 Iron Maiden 140
1 150 U2 107
2 50 Metallica 91
3 22 Led Zeppelin 87
4 113 Os Paralamas Do Sucesso 45
Launch the User Interface
from vanna.flask import VannaFlaskApp
app = VannaFlaskApp(vn)
app.run()
How to connect to Athena and redshift
*Athena *
import pandas as pd
from pyathena import connect
conn_details = {...} # fill this with your connection details
conn_athena = connect(**conn_details)
def run_sql_athena(sql: str) -> pd.DataFrame:
df = pd.read_sql(sql, conn_athena)
return df
vn.run_sql = run_sql_athena
Redshift
import pandas as pd
import psycopg2
conn_details = {...} # fill this with your connection details
conn_redshift = psycopg2.connect(**conn_details)
def run_sql_redshift(sql: str) -> pd.DataFrame:
df = pd.read_sql_query(sql, conn_redshift)
return df
vn.run_sql = run_sql_redshift
Most of Vanna’s functionality does not need your specific data, but does need your schema (eg SQL queries and structure — table names, column names)
It’s completely open source and free, and works with Snowflake, Postgres, Redshift, and most other databases.
Reference : vanna.ai.
Top comments (0)