DEV Community

Cover image for Seaborn: Statistical data visualization
Zaynul Abedin Miah
Zaynul Abedin Miah

Posted on

Seaborn: Statistical data visualization

Seaborn is an extension of matplotlib that allows for the viewing of data in Python. In order to create visually appealing and instructive statistical visuals, it offers a high-level interface. It has gorgeous predefined styles and color palettes to make statistics charts look more appealing. It was developed on top of the matplotlib library and features deep integration with the pandas library's data structures.
Seaborn's goal is to make visual analysis the primary means through which data is discovered and understood.

Different categories of plot in Seaborn

1. Distribution Plots

Let's discuss some plots that allow us to visualize the distribution of a data set. These plots are:

distplot
The distplot shows how a set of observations with only one variable are spread out.
Example:

import seaborn as sns
%matplotlib inline
tips = sns.load_dataset('tips')
sns.distplot(tips['total_bill'],kde=False,bins=30)
Enter fullscreen mode Exit fullscreen mode

Output:

Image description
jointplot
jointplot() lets you match up two distplots for data with two variables. With your choice of what kind parameter to compare with:

  • “scatter”
  • “reg”
  • “resid”
  • “kde”
  • “hex” Example:
sns.jointplot(x='total_bill',y='tip',data=tips,kind='reg')

Enter fullscreen mode Exit fullscreen mode

Image description

pairplot
pairplot will show how each pair of numbers in a dataframe is related to each other. It supports a color hue argument (for categorical columns).

sns.pairplot(tips,hue='sex',palette='coolwarm')
Enter fullscreen mode Exit fullscreen mode

Output:
Image description

rugplot
Rugplots are actually a very simple idea. All they do is draw a dash mark for every point on a univariate distribution. They are the basic elements of a KDE plot:
Example:

sns.rugplot(tips['total_bill'])
Enter fullscreen mode Exit fullscreen mode

Output:
Image description

kdeplot
Kernel Density Estimate, also known as KDE Plot, is a way to see the Probability Density of a continuous variable. It shows how the probability density changes as the value of a continuous variable goes up and down. We can also use a single graph to show data for more than one sample, which is a more efficient way to show data.

Example:

sns.kdeplot(tips['tip'])
sns.rugplot(tips['tip'])
Enter fullscreen mode Exit fullscreen mode

Output:
Image description


2. Categorical Data Plots

Now let's discuss using seaborn to plot categorical data! There are a few main plot types for this:

factorplot
Drawing a category plot onto a FacetGrid is accomplished through the usage of the seaborn.factorplot() method. It can take in a kind parameter to adjust the plot type:
Example:

sns.factorplot(x='sex',y='total_bill',data=tips,kind='bar')
Enter fullscreen mode Exit fullscreen mode

Output:

Image description

barplot
The barplot is a more complex graphing tool used for visualizing data distributions and investigating correlations between different variables. You can get aggregate data from a categorical feature of your data using these plots that are very similar to each other. The barplot is a general type of graph that lets you combine categorical data based on some function, usually the mean:
Example:

import numpy as np
sns.barplot(x='sex',y='total_bill',data=tips,estimator=np.std)
Enter fullscreen mode Exit fullscreen mode

Output:

Image description

countplot
Countplot is a simple tool for keeping tabs on numerical data. It can also be used to determine what a given value means in relation to another.

sns.countplot(x='sex',data=tips)
Enter fullscreen mode Exit fullscreen mode

Output:

Image description

boxplot
A box plot, also called a box-and-whisker plot, shows how quantitative data is spread out in a way that makes it easy to compare different variables or different levels of a categorical variable. The box shows the quartiles of the data set, and the whiskers show the rest of the distribution, except for points that are called "outliers" by a method that depends on the range between the quartiles.
Example:

sns.boxplot(x="day", y="total_bill", hue="smoker",data=tips, palette="coolwarm")
Enter fullscreen mode Exit fullscreen mode

Output:

Image description
violinplot
A violin plot is a way to show numbers on a graph. It's like a box plot, but each side has a rotated kernel density plot instead of a box plot. Violin plots are similar to box plots, but they also show the probability density of the data at different values. Usually, a kernel density estimator is used to smooth out the plots.

Example:

sns.violinplot(x="day", y="total_bill", data=tips,hue='sex',split=True,palette='Set1')
Enter fullscreen mode Exit fullscreen mode

Output:

Image description

stripplot
A strip plot is drawn on its own. It's a good addition to a boxplot or violinplot when all the observations are shown along with a way to show how the data is distributed. It is used to draw a scatter plot based on the category.
Example:

sns.stripplot(x="day", y="total_bill", data=tips,jitter=True,hue='sex',palette='Set1',split=True)
Enter fullscreen mode Exit fullscreen mode

Output:

Image description

swarmplot
The swarmplot is similar to stripplot(), but the points are adjusted (only along the categorical axis) so that they don’t overlap. This gives a better representation of the distribution of values, although it does not scale as well to large numbers of observations (both in terms of the ability to show all the points and in terms of the computation needed to arrange them).
Example:

sns.swarmplot(x="day", y="total_bill",hue='sex',data=tips, palette="Set1", split=True)
Enter fullscreen mode Exit fullscreen mode

Output:

Image description


3. Matrixplot:

Let's begin by exploring seaborn's heatmap an plots allow you to plot data as color-encoded matrices and can also be used to indicate clusters within the data.

Heatmap
Heatmaps display matrix values using colors. In this, brighter colors, usually reddish, signify more common values or higher activities, while darker hues represent less common or activity values. Shader matrix defines heatmap. Seaborn.heatmap() plots heatmaps.

Example:

flights = sns.load_dataset('flights')
pvflights = flights.pivot_table(values='passengers',index='month',columns='year')
sns.heatmap(pvflights,cmap='magma',linecolor='white',linewidths=1)
Enter fullscreen mode Exit fullscreen mode

Output:

Image description

clustermap
Hierarchical clustering is used by the clustermap to make a grouped version of the heatmap.
Example:

sns.clustermap(pvflights,cmap='coolwarm',standard_scale=1)
Enter fullscreen mode Exit fullscreen mode

Output:

Image description
Now, look at how the years and months are no longer in order, but are grouped by value (passenger count). So, we can start to figure out things about this plot, like how August and July are alike (makes sense, since they are both summer travel months).


4. Regression Plots

With the lmplot() function, it is easy to make regression plots in Seaborn. You can think of lmplot() as a function that makes a linear model plot. The linear regression plot that lmplot() makes is very simple. It makes a scatter plot on top of which a linear fit is put.
Working with markers:
Example:

# http://matplotlib.org/api/markers_api.html
sns.lmplot(x='total_bill',y='tip',data=tips,hue='sex',palette='coolwarm', markers=['o','v'],scatter_kws={'s':100})
Enter fullscreen mode Exit fullscreen mode

Output:

Image description
Using grid and aspect and size:
Example:

sns.lmplot(x='total_bill',y='tip',data=tips,col='day',hue='sex',palette='coolwarm',aspect=0.6,size=8)
Enter fullscreen mode Exit fullscreen mode

Output:

Image description

Top comments (0)