DEV Community

danieltprice
danieltprice

Posted on

Chat with your holiday song database with Neon Postgres and LangChain

🎄 Get in the festive spirit by chatting with a holiday song database 🎁

Image description

Neon is serverless Postgres built for the cloud. It has a generous Free Tier and innovative features like database branching and scale-to-zero. You can spin up a database in just a few clicks. It’ll give you a URL for your database, and you’re ready to go.

LangChain is an open-source framework for developing applications powered by Large Language Models (LLMs). With LangChain, you can build applications that give answers or take actions in response to a variety of possible inputs or contexts, like prompt instructions, sample inputs or responses, or any other data that the reactions should be based on.

For example, with LangChain, you can connect LLMs with a data source such as Neon Postgres and sequence multiple commands (a series of prompts) that use the data source to achieve a desired result.

In this post, you’ll learn how to create a “Holiday Song” app in Python that lets you ask your database questions about “chart-topping holiday songs” in natural language. The application transforms your question into an SQL query, runs the query, and returns a natural language response based on the query result. This is achieved using Langchain’s SQLDatabaseChain.

The application uses a Kaggle dataset called Top Holiday Songs by Position Over The Years (credit to Throwback Thursday), which is a collection of the most popular holiday songs appearing on the charts from 2011 to 2018.

Prerequisites

You’ll need the following to get started:

  • A Neon account.
  • An OpenAI API key. Instructions for obtaining an OpenAI API key are provided below.
  • python and pip

Grab your OpenAI API key

The application uses an OpenAI model, which requires an OpenAI API key. If you do not have an OpenAI API key, you can follow these steps to create one:

  1. Navigate to https://platform.openai.com/.
  2. Click on your name or icon, located at the top right corner of the page, and select View API Keys.
  3. Click on the Create new secret key button to create a new OpenAI API key.

Create a directory for your application

mkdir holiday_songs_app
cd holiday_songs_app
Enter fullscreen mode Exit fullscreen mode

Set up your environment

Create and activate a Python virtual environment in your application directory by executing the following command:

python -m venv venv
Enter fullscreen mode Exit fullscreen mode

Install dependencies

Install the following libraries using pip.

pip install LangChain LangChain-experimental openai python-environ psycopg2-binary
Enter fullscreen mode Exit fullscreen mode

Create a database in Neon

In Neon, create a database named holiday_songs. You can do this from the Neon Console:

  1. Navigate to the Neon Console.
  2. Select a project.
  3. Select Databases.
  4. Select the branch where you want to create the database.
  5. Click New Database.
  6. Enter holiday_songs as the database name, and select a database owner.
  7. Click Create.

Image description

Get your database connection string

Grab the connection string for your holiday_songs database. You can copy it from the Connection Details widget on the Neon Dashboard.

Image description

It will look something like this:

postgresql://daniel:************@ep-snowy-lake-123456.us-east-2.aws.neon.tech/holiday_songs?sslmode=require
Enter fullscreen mode Exit fullscreen mode

Set up your environment variables

Create a .env file in your application directory and configure variables for your OpenAI API key and your database connection string:

OPENAI_API_KEY=[your_openai_api_key]
DATABASE_URL="postgresql://daniel:************@ep-snowy-lake-123456.us-east-2.aws.neon.tech/holiday_songs?sslmode=require"
Enter fullscreen mode Exit fullscreen mode

Create a table and insert data

Create a file named db.py in your application directory and add the following code to connect to your database, create a songs table, and load the songs table with data from a holiday_songs.csv file hosted on GitHub.

import os
import psycopg2
import environ
import requests
import io

# 1. Setup and read the .env file
env = environ.Env()
environ.Env.read_env()

# 2. Establish a connection to the PostgreSQL database
connection_string = env('DATABASE_URL')
conn = psycopg2.connect(connection_string)
cursor = conn.cursor()

# 3. Create the songs table if it doesn't already exist
table_creation_sql = """
CREATE TABLE IF NOT EXISTS public.songs (
    id INT,
    year INT,
    position INT,
    song TEXT,
    artist TEXT,
    chart_date DATE
);
"""
cursor.execute(table_creation_sql)

# 4. Download the holiday_songs.csv file from GitHub
url = "https://github.com/neondatabase/postgres-sample-dbs/raw/main/holiday_songs.csv"
response = requests.get(url)
response.raise_for_status()

# Use StringIO to convert text data into file-like object so it can be read into the database
csv_file = io.StringIO(response.text)

# 5. Load the data from the holiday_songs.csv into the songs table
copy_command = '''
COPY public.songs (id, year, position, song, artist, chart_date)
FROM STDIN WITH (FORMAT CSV, HEADER true, DELIMITER ',');
'''
cursor.copy_expert(copy_command, csv_file)

# 6. Commit any changes and close the connection
conn.commit()
conn.close()
Enter fullscreen mode Exit fullscreen mode

Run the db.py script

Run the db.py script using the following command:

python db.py
Enter fullscreen mode Exit fullscreen mode

You can verify that the data was loaded by viewing the data in the Neon console. Select Tables from the sidebar, and navigate to the holiday_songs database.

Image description

Set up the SQL database chain

Create a file named app.py in your application directory and add the following code.

import os
import environ
import psycopg2
from urllib.parse import urlparse
from langchain.utilities import SQLDatabase
from langchain.llms import OpenAI
from langchain_experimental.sql import SQLDatabaseChain
from langchain.prompts import PromptTemplate

# Set up and read the .env file
env = environ.Env()
environ.Env.read_env()

# Extract connection details from the .env file
connection_string = env('DATABASE_URL')
parsed_uri = urlparse(connection_string)
username = parsed_uri.username
password = parsed_uri.password
host = parsed_uri.hostname
port = parsed_uri.port or 5432
database = parsed_uri.path[1:]  # remove leading '/'

# Setup database
db = SQLDatabase.from_uri(
    f"postgresql+psycopg2://{username}:{password}@{host}:{port}/{database}"
)

# Setup llm
llm = OpenAI(temperature=0, openai_api_key=os.environ["OPENAI_API_KEY"])

# Define table_info and few_shot_examples
table_info = """public.songs (
    id  integer,
    year integer,
    position integer,
    song text,
    artist text,
    chart_date date
)"""

few_shot_examples = """
- Question: "Who topped the charts in 2018?"
  SQLQuery: SELECT artist, song FROM songs WHERE year = 2018 AND position = 1;

- Question: "Which artist has the most songs that have made it to the top 20 holiday songs list?"
  SQLQuery: SELECT artist, COUNT(*) AS song_count FROM holiday_songs GROUP BY artist ORDER BY song_count DESC LIMIT 1;

- Question: "What is the highest chart position that a holiday song has achieved each year?"
  SQLQuery: SELECT year, MIN(position) AS highest_position FROM holiday_songs GROUP BY year;

- Question: "How many unique songs have charted in the top 20 over the years?"
  SQLQuery: SELECT COUNT(DISTINCT song) FROM holiday_songs;

- Question: "Which song has appeared most frequently in the top 20 holiday songs list?"
  SQLQuery: SELECT song, COUNT(*) AS appearance_count FROM holiday_songs GROUP BY song ORDER BY appearance_count DESC LIMIT 1;

- Question: "In which year did we have the most number of unique artists in the top 20 holiday songs list?"
  SQLQuery: SELECT year, COUNT(DISTINCT artist) AS unique_artists FROM holiday_songs GROUP BY year ORDER BY unique_artists DESC LIMIT 1;
"""

# Define Custom Prompt
TEMPLATE = """Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
Use the following format:

Question: "Question here"
SQLQuery: "SQL Query to run"
SQLResult: "Result of the SQLQuery"
Answer: "Final answer here"

Only use the following tables:

{table_info}

Some examples of SQL queries that correspond to questions are:

{few_shot_examples}

Question: {input}"""

CUSTOM_PROMPT = PromptTemplate(
    input_variables=["input", "few_shot_examples", "table_info", "dialect"], template=TEMPLATE
)

# Setup the database chain
db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)

def get_prompt():
    print("Type 'exit' to quit")
    while True:
        prompt = input("Ask a question or type exit to quit: ")

        if prompt.lower() == 'exit':
            print('Exiting...')
            break
        else:
            try:
                question = CUSTOM_PROMPT.format(
                    input=prompt,
                    few_shot_examples=few_shot_examples,
                    table_info=table_info,
                    dialect="PostgreSQL"
                )
                print(db_chain.run(question))
            except Exception as e:
                print(e)

get_prompt()

Enter fullscreen mode Exit fullscreen mode

Run the application

Run the application using the following command.

python app.py
Enter fullscreen mode Exit fullscreen mode

When prompted, ask your holiday_songs database a question like, "Which topped the charts in 2016?" or "Who had the most number 1’s?".

Image description

And that's it! I hope you enjoy chatting with your holiday song database. Have a great holiday and best wishes in the New Year! 🔔

References

  1. Neon Serverless Postgres
  2. LangChain
  3. SQLDatabaseChain
  4. Kaggle: Top Holiday Songs by Position Over The Years

Top comments (0)