DEV Community

Azad Kshitij
Azad Kshitij

Posted on

Data Science: What is a box plot?

Youtube Short

A box plot, also known as a box and whisker plot, is a graphical representation of a dataset that shows the distribution of values in the data. It is a useful tool for visualizing the spread and skewness of a dataset, as well as identifying outliers.

The box plot can be used to compare the distribution of multiple datasets by creating a box plot for each dataset and placing them side by side. It is also possible to overlay box plots on top of each other to compare the distributions more closely.

Box plot image

  • Box plot is a graphical representation of a dataset that shows the distribution of values in the data.

    • The top line is maximum value.
    • Bottom line is minimum value.
    • The Centre line is Median.
    • Top of the box is 75th percentile value.
    • Bottom of the box is 25th percentile value.
    • You see those circles outside yes those are called 'outliers'.
  • Lets see how to create one with python.

    • Start by importing necessary packages.
    • We will use seaborn to create the plot.
import seaborn as sns
import matplotlib.pyplot as plt
Enter fullscreen mode Exit fullscreen mode
  • Lets use some inbuilt dataset that comes with seaborn. called taxis. and set the style of the graph as white grid.
sns.set(style="whitegrid")

df = sns.load_dataset("taxis")
Enter fullscreen mode Exit fullscreen mode
  • Now define values for the x-axis and y-axis. and define a list of cities you want to create box plot for.
x = "pickup_borough"
y = "total"
cities = ["Queens"]
Enter fullscreen mode Exit fullscreen mode
  • Create the plot with sns.boxplot() function, and provide df as data. set x as x y as y and order boxplot in order of cities list. Now use plt.show() function to show the graph.
ax = sns.boxplot(data=df, x=x, y=y, order=cities)

plt.show()
Enter fullscreen mode Exit fullscreen mode

Final

import seaborn as sns
import matplotlib.pyplot as plt

sns.set(style="whitegrid")

df = sns.load_dataset("taxis")

x = "pickup_borough"
y = "total"
cities = ["Queens"]
ax = sns.boxplot(data=df, x=x, y=y, order=cities)

plt.show()
Enter fullscreen mode Exit fullscreen mode

Result

Bol plot created with python image

I hope this tutorial has helped you understand the basics of box plots. If you have any questions comment them down below I will be more than happy to answer them.

Top comments (0)