Dendi Handian

Posted on Sep 1, 2021 • Edited on Jan 30

SQL Query into Pandas DataFrame - Part 2

#sql #python #datascience #sqlite

Continuing the last part, we are going deep into intermediate SQL translated into Pandas DataFrame.

The Playground Database

We will be using the same SQLite database from the previous post, but now we will use the invoice table and the csv file of the table.

Preparing the DataFrame

import pandas as pd

invoices_df = pd.read_csv("invoices.csv")

Aggregation Queries into Pandas DataFrame

We will cover some aggregation function and groupby function in both sql and pandas.

Basic Aggregation

SUM:

SELECT SUM(Total)
FROM invoices

invoices_df['Total'].sum()

COUNT():

SELECT COUNT(BillingState)
FROM invoices

invoices_df['BillingState'].count()

AVG():

SELECT AVG(Total)
FROM invoices

invoices_df['Total'].mean()

MAX():

SELECT MAX(Total)
FROM invoices

invoices_df['Total'].max()

MIN():

SELECT MIN(Total)
FROM invoices

invoices_df['Total'].min()

GROUP BY

SELECT 
    CustomerId,
    SUM(Total) AS Total
FROM invoices
GROUP BY CustomerId

## grouping with all (number) columns aggregated
invoices_df.groupby(['CustomerId']).sum()

## the same as the sql result
invoices_df.groupby(['CustomerId']).sum().reset_index()[['CustomerId', 'Total']]

Next Part

DEV Community

SQL Query into Pandas DataFrame - Part 2

The Playground Database

Preparing the DataFrame

Aggregation Queries into Pandas DataFrame

Basic Aggregation

GROUP BY

Top comments (0)

Read next

How to Learn Python From Scratch in 2025: An Expert Guide

Building a Local AI Code Reviewer with ClientAI and Ollama

Introducing uv: Next-Gen Python Package Manager

Design Patterns: Your Secret Weapon in Software Engineering