Youtube Short
A box plot, also known as a box and whisker plot, is a graphical representation of a dataset that shows the distribution of values in the data. It is a useful tool for visualizing the spread and skewness of a dataset, as well as identifying outliers.
The box plot can be used to compare the distribution of multiple datasets by creating a box plot for each dataset and placing them side by side. It is also possible to overlay box plots on top of each other to compare the distributions more closely.

Box plot is a graphical representation of a dataset that shows the distribution of values in the data.
 The top line is maximum value.
 Bottom line is minimum value.
 The Centre line is Median.
 Top of the box is 75th percentile value.
 Bottom of the box is 25th percentile value.
 You see those circles outside yes those are called 'outliers'.

Lets see how to create one with python.
 Start by importing necessary packages.
 We will use seaborn to create the plot.
import seaborn as sns
import matplotlib.pyplot as plt
 Lets use some inbuilt dataset that comes with seaborn. called taxis. and set the style of the graph as white grid.
sns.set(style="whitegrid")
df = sns.load_dataset("taxis")
 Now define values for the xaxis and yaxis. and define a list of cities you want to create box plot for.
x = "pickup_borough"
y = "total"
cities = ["Queens"]
 Create the plot with
sns.boxplot()
function, and providedf
as data. set x as x y as y and order boxplot in order of cities list. Now useplt.show()
function to show the graph.
ax = sns.boxplot(data=df, x=x, y=y, order=cities)
plt.show()
Final
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="whitegrid")
df = sns.load_dataset("taxis")
x = "pickup_borough"
y = "total"
cities = ["Queens"]
ax = sns.boxplot(data=df, x=x, y=y, order=cities)
plt.show()
Result
I hope this tutorial has helped you understand the basics of box plots. If you have any questions comment them down below I will be more than happy to answer them.
Top comments (0)