Kaira Kelvin.

Posted on Jan 3, 2024 • Edited on Sep 18, 2024

Pandas notes.

Once you know how to read a CSV file from local storage into memory, reading data from other sources is a breeze.

pd.concat()

It allows us to join two or more data frames along either rows and columns.
Sometimes the input data frames have generic indexes that overlap, Like row numbers in spreadsheet. It has an optional parameter called ignore-index.
By specifying axis = 1 in the concat statement,we override the default behavior and join the columnns.

pd.concat([df1,df2...],
axis=1)

pd.concat([df1,df2....],
ignore_index=True)

joined_df = left_df.merge(right_df)

Before creating line plots with Matplotlib, first set up the environment which including installing Matplotlib.
To install matplotlib use pip ie the package installer for python .

A.Plotting a line graph in pandas.

This is the most used visualizations, line plots are excellent at tracking the evolution of a variable over time.

When plotting a line graph in pandas
here's a sample of a code

`Publications_per_year =df['year of  publication'].value_counts().sort_index()
publication_per_year.plot(kind='line',marker='o',linestyle='_',color='b',figsize=(10,6))
plt.xlabel('year of publication)
plt.ylabel('number of publications)
plt.title('A line graph showing number of publications against year')
plt.grid(true)
plt.show()

Essential Elements of Line Graph plotting.

1.Color b
The line of the graph takes a bold color blue, u can use different colors when plotting different colors when plotting different line graphs.
To create a line plot, use the plt.plot()function.
The plt.plot () function plots a blue line.

 plt.plot(dates,closing_price,color ='red')

2.Alpha=0.5
The alpha parameter is used to control the transparency of the color.0 (completely transparent) and 1 (completely opaque). Setting it to a value less than 1 will make the color more transparent.

3.Line width
changing the line width by passing a linewidth parameter to the plt.plot() function .
The linewidth parameter takes a floating-point value representing the line's width.
plt.plot(dates,closing_price,linewidth =3)

4.The marker parameter in Matplotlib determines the style of marker used to highlight data points on the line.
Specifically, marker='o' specifies that a circular marker will be used.

Below are more examples of various line styles and markers to create different lines in the plot. Which you can use to customize the combinations of line styles and markers to achieve the desired visual effect in your plots.

5.Grid lines
We can also add grid lines to our plot to make it more readable.We can achieve by using the plt.grid() function.The plt.grid() function takes a boolean value reprensing whether the grid should be shown.
plt.grid(True)

B. Bar Plots.

A bar chart ranks data according to the value of multiple categories. It consists of rectangles whose lengths are proportional to the value of each category. They are prevalent since they are easy to read.
Making bar plots instead of line plots is as simple as passing kind='bar' (for vertical bars) or kind='barh' (for horizontal bars).
Stacked bar plots are created from a DataFrame by passing stacked=True.

df.plot(kind='barh', stacked=True, alpha=0.5)
A useful recipe for bar plots is to visualize a Series’s value frequency using value_counts:

s.value_counts().plot(kind='bar')

C. Histograms and Density Plots.

A histogram, with which you may be well-acquainted, is a kind of bar plot that gives a discretized display of value frequency.
The data points are split into discrete, evenly
spaced bins, and the number of data points in each bin is plotted

D. Plotting a Pie Chart.

A pie chart is a circular statistical graphic that is divided into slices also called (wedges) to illustrate numerical proportions. The area of the chart is the total percentage of the given data.
Syntax

matplotlib.pyplot.pie(data, explode=None, labels=None, colors=None, autopct=None, shadow=False)
eg

plt.pie(chart,labels=chart.index autopct= '%1.1f%%' ,startangle =90)

The Anatomy of a Great Pie Chart.

1.Explode

Maybe you want one of the wedges to stand out ? The explode parameter allows you to do that. If it is specified and not none, must be an array with one value for each wedge.

eg explode= [0.2,0,0,0] it will pull the first element 0.2 from the center of the pie.

2.Shadow
Adding a shadow to the pie chart by setting the shadows parameter to True: (shadow= True)

3.Legend
Adding a list of explanations for each wedge, use the Legend()function. u can add title to the legend by adding
plt.legend(title = "Vict Sex")

4.Autopct.
It is used to format and display the percentage labels on each wedge of a pie chart. It allows you to automatically calculate and format the percentage values based on the sizes of the wedges.

'%1.1f%%'- Displays the percentage with one digit before the decimal point and one digit after the decimal point,followed by the percentage symbol eg = "25.5%"
'%.2f%%' Displays the percentage with two digits after the decimal point for example 43.56%, 47.99%
'%.0f%% Displays the percentage with two digits after the decimal point.

plt.setp(autotexts, size=10, weight="bold"

 fig,ax=plt.subplots(figsize=(6,8))
 explode=[0.0,0.0,0.1]
 wp={'linewidth':1,'linestyle ':'-','edgecolor':'black'}
 colors=("orange","cyan","indigo")
 chart = Dinosaurs['diet'].value_counts()

to create a pie chart with labels, autopct formatting,wedges properties, and explode.

wedges,texts,autotexts=ax.pie(chart,labels=chart.index,autopct='%1.1f%%',startangle=140,colors=colors,explode=explode,
wedgeprops=wp,
ax.legend(wedges,chart.index.tolist(),
title="types of dinosaurs diet",
bbox_to_anchor=(0.1,0.5),
loc="best")
plt.setp(autotexts,size=12,weight="bold"

The bbox_to_anchor parameter takes a tuple of two values (x, y), where x and y are the coordinates in the figure's normalized coordinate system.
The normalized coordinate system ranges from 0 to 1, where (0, 0) is the bottom-left corner and (1, 1) is the top-right corner of the figure.
Here's a breakdown of the bbox_to_anchor parameter:

(0, 0) corresponds to the bottom-left corner of the figure.
(1, 1) corresponds to the top-right corner of the figure.
(0.5, 0.5) corresponds to the center of the figure.

Below is a chart showing the position of legend in a figure.

Magic commands start with either % or %% and the command we need to nicely display plots inline is %matplotlib inline, with this magic in place all plots created in code cells will automatically be displayed inline.
in the new version of juypter notebooks %matplotlib inline is not strictly necessary plots will often be displayed automatically.

Question.
The score of a team in 5 IPL matches is available to you. Write a program to create pie chart from this data, showing the last match's performance as a wedge.

import pandas as pd 
import matplotlib.pyplot as plt
scores =['12','13','27','30','45']
labels =[f'Match {i+1}' for I in range(len(scores))]
explode=[0.0,0.0,0.0,0.0,0.1]
colors=['brown','cyan','indigo','violet','yellow']
plt.figure(figsize=(8,8))
plt.title("Fedha Stars Goals in Five Matches")       
plt.pie(scores,labels=none,autopct='%1.1f%%,startangle=90,colors=colors,explode=explode)
plt.legend(labels,loc='best')
plt.show()

The Box-plot.

A box plot is a method for graphically depicting groups of numerical data through their quartiles.
The box extends from the Q1 to Q3 quartile values of the data, with a line at the median (Q2). The whiskers extend from the edges of the box to show the range of the data.
The position of the whiskers is set by default to 1.5*IQR (IQR = Q3 - Q1) from the edges of the box. Outlier points are those past the end of the whiskers.

In addition, the box plot allows one to visually estimate various L-estimators notably the interquartile range, midhinge, range, midrange, and trimean. Box plots can be drawn either horizontally or vertically.
Boxplot - it displays the dataset based on the five-number summary: the minimum, the maximum, the sample median, and the first and third quartiles.

Minimum (Q0 or 0th percentile): the lowest data point in the data set excluding any outliers
Maximum (Q4 or 100th percentile): the highest data point in the data set excluding any outliers
Median (Q2 or 50th percentile): the middle value in the data set.The middle value of a dataset where 50%of the data is less than the median and 50% of the data is higher than the median.
First quartile (Q1 or 25th percentile): also known as the lower quartile qn(0.25), it is the median of the lower half of the dataset.
Third quartile (Q3 or 75th percentile): also known as the upper quartile qn(0.75), it is the median of the upper half of the dataset.
Interquartile range (IQR) the distance between the upper and lower quartiles.IQR =Q3-Q1 =qn(0.75)-qn(0.25) .The upper quartile minus the lower quartile.

Outliers- Any values above the "maximum" or below the "minimum".

DEV Community

Pandas notes.

A.Plotting a line graph in pandas.

Essential Elements of Line Graph plotting.

B. Bar Plots.

C. Histograms and Density Plots.

D. Plotting a Pie Chart.

The Anatomy of a Great Pie Chart.

The Box-plot.

Top comments (0)

Read next

C4 Model real world example with Google Maps

C4 Model

AWS Fargate on a Budget

AI System Tracks Live Music with Sheet Music in Real-Time