DEV Community

Cover image for Python Database Connectivity and SQL Basics for EDA πŸπŸ“Š
Anand
Anand

Posted on

Python Database Connectivity and SQL Basics for EDA πŸπŸ“Š

Performing Basic Data Analysis with Python and SQL

In today's data-driven world, being able to extract insights from data is a valuable skill. In this article, we'll explore how to perform basic data analysis using Python and SQL. We'll connect to a SQLite database, execute SQL queries, and visualize the results using Python libraries. πŸ“Š

Database Schema:

The database contains information about sales transactions made by a company. Here's what each field represents:

  1. TransactionID (INTEGER, PRIMARY KEY):
    • This field uniquely identifies each transaction. It's an integer value and serves as the primary key for the table.
  2. ProductID (INTEGER):
    • This field represents the unique identifier for each product sold in the transaction. It's also an integer value.
  3. ProductName (TEXT):
    • This field stores the name or description of the product sold. It's a text field allowing for variable-length strings.
  4. Quantity (INTEGER):
    • This field indicates the quantity of the product sold in the transaction. It's an integer value.
  5. PricePerUnit (REAL):
    • This field represents the price per unit of the product sold. It's a real number (floating-point value) to accommodate decimal prices.
  6. SalesDate (TEXT):
    • This field stores the date of the sales transaction. It's represented as text and typically follows a specific date format.

Explanation:

The provided schema simulates a basic sales database, commonly used in retail or e-commerce settings. Each transaction (identified by TransactionID) involves the sale of one or more products. For each product sold, the database records the ProductID, ProductName, Quantity sold, PricePerUnit, and the date of the transaction (SalesDate).

Purpose:
This database schema and the accompanying data were created for the purpose of practicing database management and performing data analysis tasks,it is not real-world data.But it provides a simplified representation of sales transactions, making it suitable for learning SQL queries, data manipulation, and analysis.

Note:
It's important to note that in a real-world scenario, databases and data structures may be more complex, and additional considerations such as data integrity, normalization, and indexing would be taken into account. However, for learning and practice purposes, this schema serves as a good starting point.

Setting Up the Environment

First, let's ensure we have the necessary tools installed. We'll need Python, SQLite, and the matplotlib library for data visualization. πŸ’»

pip install matplotlib
pip install sqlite3

Enter fullscreen mode Exit fullscreen mode

Connecting to the Database

We'll start by creating a SQLite database and populating it with some sample data. We'll then connect to this database using Python's sqlite3 library. πŸ”Œ

import sqlite3
import matplotlib.pyplot as plt

# Connect to the database (in-memory database for this example)
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
Enter fullscreen mode Exit fullscreen mode

Creating the Database Schema and Inserting Data

We'll define a schema for our database containing information about sales transactions. The schema includes fields such as TransactionID, ProductID, ProductName, Quantity, PricePerUnit, and SalesDate. πŸ’Ό

# Creating a table
cursor.execute('''CREATE TABLE IF NOT EXISTS sales (
                TransactionID INTEGER PRIMARY KEY,
                ProductID INTEGER,
                ProductName TEXT,
                Quantity INTEGER,
                PricePerUnit REAL,
                SalesDate TEXT
                )''')

# Inserting sample data
sales_data = [
    (1, 101, 'Product A', 10, 20.0, '2024-01-15'),
    (2, 102, 'Product B', 15, 25.0, '2024-01-20'),
    (3, 101, 'Product A', 8, 20.0, '2024-02-05'),
    (4, 103, 'Product C', 12, 30.0, '2024-02-10'),
    (5, 102, 'Product B', 20, 25.0, '2024-03-02'),
    (6, 104, 'Product D', 5, 35.0, '2024-03-10')
]
cursor.executemany('INSERT INTO sales VALUES (?, ?, ?, ?, ?, ?)', sales_data)
Enter fullscreen mode Exit fullscreen mode

Performing Data Analysis Tasks

Now that our database is set up, let's perform some basic data analysis tasks using SQL queries. πŸ“Š

Task 1: Total Sales Revenue for Each Product

We'll calculate the total sales revenue for each product by multiplying the Quantity with PricePerUnit and summing it up. πŸ’°

query1 = '''SELECT ProductID, ProductName, SUM(Quantity * PricePerUnit) AS TotalRevenue
            FROM sales
            GROUP BY ProductID'''
cursor.execute(query1)
result = cursor.fetchall()
Enter fullscreen mode Exit fullscreen mode

Visualizing the Results

We can visualize the results using Python libraries such as matplotlib. πŸ“Š

# Plotting pie chart for Task 1
labels = [row[1] for row in result] 
revenues = [row[2] for row in result]  
plt.pie(revenues, labels=labels, autopct='%1.1f%%')
plt.title('β†’Task 1: Total sales revenue distribution')
plt.show()
Enter fullscreen mode Exit fullscreen mode

pie chart for eda

Task 2: Top 3 Best-Selling Products

We'll find the top 3 best-selling products based on the total quantity sold. πŸ†

query2 = '''SELECT ProductID, ProductName, SUM(Quantity) AS TotalQuantity
            FROM sales
            GROUP BY ProductID
            ORDER BY TotalQuantity DESC
            LIMIT 3'''
cursor.execute(query2)
result = cursor.fetchall()
Enter fullscreen mode Exit fullscreen mode

output

| ProductID | ProductName | TotalQuantity |
|-----------|-------------|---------------|
| 102       | Product B   | 35            |
| 101       | Product A   | 18            |
| 103       | Product C   | 12            |

Enter fullscreen mode Exit fullscreen mode

Task 3: Average Price Per Unit for All Products

We'll calculate the average price per unit for all products. πŸ“ˆ

query3 = '''SELECT AVG(PricePerUnit) AS AveragePricePerUnit
            FROM sales'''
cursor.execute(query3)
result = cursor.fetchall()
Enter fullscreen mode Exit fullscreen mode

output

| AveragePricePerUnit |
|----------------------|
| 26.666666666666668   |
Enter fullscreen mode Exit fullscreen mode

Conclusion

In this article, we've demonstrated how to connect to a SQLite database using Python, execute SQL queries, and perform basic data analysis tasks. By leveraging Python libraries such as sqlite3 and matplotlib, we can efficiently analyze data and gain valuable insights. πŸš€


LinkedIn GitHub SoloLearn

Top comments (0)