DEV Community

Dendi Handian
Dendi Handian

Posted on • Updated on

SQL Query into Pandas DataFrame - Part 2

Continuing the last part, we are going deep into intermediate SQL translated into Pandas DataFrame.

The Playground Database

We will be using the same SQLite database from the previous post, but now we will use the invoice table and the csv file of the table.

Preparing the DataFrame

import pandas as pd

invoices_df = pd.read_csv("invoices.csv")
Enter fullscreen mode Exit fullscreen mode

Aggregation Queries into Pandas DataFrame

We will cover some aggregation function and groupby function in both sql and pandas.

Basic Aggregation

SUM:

SELECT SUM(Total)
FROM invoices
Enter fullscreen mode Exit fullscreen mode
invoices_df['Total'].sum()
Enter fullscreen mode Exit fullscreen mode

COUNT():

SELECT COUNT(BillingState)
FROM invoices
Enter fullscreen mode Exit fullscreen mode
invoices_df['BillingState'].count()
Enter fullscreen mode Exit fullscreen mode

AVG():

SELECT AVG(Total)
FROM invoices
Enter fullscreen mode Exit fullscreen mode
invoices_df['Total'].mean()
Enter fullscreen mode Exit fullscreen mode

MAX():

SELECT MAX(Total)
FROM invoices
Enter fullscreen mode Exit fullscreen mode
invoices_df['Total'].max()
Enter fullscreen mode Exit fullscreen mode

MIN():

SELECT MIN(Total)
FROM invoices
Enter fullscreen mode Exit fullscreen mode
invoices_df['Total'].min()
Enter fullscreen mode Exit fullscreen mode

GROUP BY

SELECT 
    CustomerId,
    SUM(Total) AS Total
FROM invoices
GROUP BY CustomerId
Enter fullscreen mode Exit fullscreen mode
## grouping with all (number) columns aggregated
invoices_df.groupby(['CustomerId']).sum()

## the same as the sql result
invoices_df.groupby(['CustomerId']).sum().reset_index()[['CustomerId', 'Total']]
Enter fullscreen mode Exit fullscreen mode

Next Part

Top comments (0)