INTRODUCTION
Dear readers, in this article, I am going to share my findings of a data analysis project that I recently undertook during my HNG internship program. HNG is a fast paced program that helps developers and people in the tech field to practice their skills according to their domain. In this task, we were given sample datasets from Kaggle and we were asked to perform analysis on the project. The main aim of this project was to come up with a detailed data analysis report. Without wasting more time, let's dive into the details.
PREREQUISITES AND PREPARATION
Install necessary tools
You can use either Excel, SQL, Python or your preferred tool. In my case, I used Python. I installed the Python libraries for data analysis such as Pandas for data manipulation and Matplotlib for visualization.
Extract your data
I extracted my data from kaggle using Pandas. The following is the link
https://www.kaggle.com/datasets/kyanyoga/sample-sales-data
Perform your data cleaning and analysis
After extracting your data, study it then start working on it
KEY VARIABLES AND DATATYPES
Numeric(Integers and Floats):
I. Sales
II. Price Each
III. Quantity Ordered
IV. Order NumberCategorical:
I. Country
II. City
III. Customer Name
IV. Product Line
V. Deal Size
VI. StatusDate:
I. Order Date
INITIAL INSIGHTS
- The dataset has 2823 rows and 25 columns
This is achieved by using .shape method on your Dataframe eg if your Dataframe is named df, below is how you can implement this.
print(df.shape)
- There is a total of 7 products in the dataset that is unique names in the PRODUCTLINE column
- The dataset has null values and these are found in 3 columns namely State, Address line 2 and Postal code
SALES PERFORMANCE BY PRODUCT
Among the 7 products in the dataset, classic cars have the highest total sales of 3.27 million while Trains have the lowest of 201 thousand. Also, classic cars have the highest orders of 28,547 while trains have the least number of orders of 2,395.
Below is a Pie chart showing sales of each product in percentage:
SALES PERFORMANCE BY TERRITORY
EMEA has the highest number of sales while Japan has the lowest number of sales
SALES PERFORMANCE BY MONTH
November has the highest number of sales of 2.1 million while June has the lowest number of sales of 454,756.78 thousand.
Below is a figure that shows this data:
PERFORMANCE BY STATUS
2617 products were successfully shipped
**60 **products that were initially ordered got cancelled
CONCLUSION
Initial observation of the sales dataset reveals sales distribution over key metrics such as year, month, territory and even products. More insights to be found include:
a. Customers with highest number of sales
b. The countries with least number of sales and the ones with highest number of sales
c. Customers that have the highest number of orders cancelled
To view a detailed analysis of this project, check my repository on github below:
https://github.com/Doreen970/HNG_data
To learn about HNG internship, follow the following links:
https://hng.tech/internship
https://hng.tech/hire
Top comments (0)