A box plot, also known as a box and whisker plot, is a graphical representation of a dataset that shows the distribution of values in the data. It is a useful tool for visualizing the spread and skewness of a dataset, as well as identifying outliers.
The box plot can be used to compare the distribution of multiple datasets by creating a box plot for each dataset and placing them side by side. It is also possible to overlay box plots on top of each other to compare the distributions more closely.
Box plot is a graphical representation of a dataset that shows the distribution of values in the data.
- The top line is maximum value.
- Bottom line is minimum value.
- The Centre line is Median.
- Top of the box is 75th percentile value.
- Bottom of the box is 25th percentile value.
- You see those circles outside yes those are called 'outliers'.
Lets see how to create one with python.
- Start by importing necessary packages.
- We will use seaborn to create the plot.
import seaborn as sns import matplotlib.pyplot as plt
- Lets use some inbuilt dataset that comes with seaborn. called taxis. and set the style of the graph as white grid.
sns.set(style="whitegrid") df = sns.load_dataset("taxis")
- Now define values for the x-axis and y-axis. and define a list of cities you want to create box plot for.
x = "pickup_borough" y = "total" cities = ["Queens"]
- Create the plot with
sns.boxplot()function, and provide
dfas data. set x as x y as y and order boxplot in order of cities list. Now use
plt.show()function to show the graph.
ax = sns.boxplot(data=df, x=x, y=y, order=cities) plt.show()
import seaborn as sns import matplotlib.pyplot as plt sns.set(style="whitegrid") df = sns.load_dataset("taxis") x = "pickup_borough" y = "total" cities = ["Queens"] ax = sns.boxplot(data=df, x=x, y=y, order=cities) plt.show()
I hope this tutorial has helped you understand the basics of box plots. If you have any questions comment them down below I will be more than happy to answer them.
Top comments (0)