In this tutorial, we will learn how to combine two charts, specifically two line charts using seaborn and python. When we combine two charts, they share a common x-axis while having different y-axes. Suppose you have two line charts - A and B. When we combine and merge these two line charts into one line chart, they will have a common x-axis. However, the y-axis of line chart A will be on the left and the y-axis of line chart B will be on the right or vice versa.
Let us combine two line charts using seaborn in Python.
We kick things off by importing the necessary libraries for our tutorial. Next, we will import our dataset. You can find the dataset at this link. Here is a download link for the same. The dataset is by the World Bank on Brazil's Environment Indicators.
import pandas as pd import matplotlib.pyplot as plt import seaborn as sns %matplotlib inline df = pd.read_csv('environment_bra.csv') df.head()
As you can see the first row does not contain any useful data. Instead, it specifies what each column contains. This row can be removed. Let us drop the first row and then check the length of the resulting dataframe.
df.drop(index=0, inplace=True) df.head() len(df)
And the output is -
Great! Next, let us check for any missing values in the dataframe.
Good! There are no missing values in the dataframe. Now, let us get some information on the dataframe using the
Year and the
Value columns are numerical but have their datatype as "object". This may cause some problems for us while doing the visualizations. To be on the safer side, let us convert the
Year column to int and the
Value column to float.
df['Value'] = df['Value'].astype(float) df['Year'] = df['Year'].astype('int64') df.info()
Fantastic! Next, let us see all the unique values in the
Indicator Name column. The
Indicator Name column contains the name of the environment indicators for which we have the data.
Wow! That's a lot of information packed in one dataset.
Let us use line charts to plot the information for the following indicators - 'Agricultural land (% of land area)' and 'Forest area (sq. km)'.
First, let's start with the 'Agricultural land (% of land area)' indicator. We will create a new dataframe that only has the details of this indicator.
agri_land = df[df['Indicator Name'] == 'Agricultural land (% of land area)'] agri_land.head()
Great! Now, let us plot a line chart for this using seaborn to see the trend of the increase/decrease in agricultural land cover in Brazil from 1961 to 2016.
First, we define the figure size. I have defined it as (12, 6). Feel free to use a different figure size. Remember, the format for figure size is (length, height).
Next, we will call the
lineplot() function from seaborn and specify the x and y axis values and the dataframe.
sns.despine() to get rid of the top and the right hand side border that comes with the chart.
Lastly, we use
plt.ylabel() to specify the label of the y-axis and
plt.title() to specify the title of the line chart.
fig, ax = plt.subplots(figsize=(12,6)) lineplot = sns.lineplot(x=agri_land['Year'], y=agri_land['Value'], data=agri_land) sns.despine() plt.ylabel('% of land area') plt.title('Agricultural land cover trend in Brazil', pad=20);
After running this code, you will get the below line chart visualization.
Similarly, let us draw a line chart for the 'Forest area (sq. km)' indicator.
forest_land = df[df['Indicator Name'] == 'Forest area (sq. km)'] forest_land.reset_index(inplace=True) fig, ax = plt.subplots(figsize=(12,6)) lineplot = sns.lineplot(x=forest_land['Year'], y=forest_land['Value'], data=forest_land) sns.despine() plt.ylabel('% of land area') plt.title('Forest land area trend in Brazil', pad=20);
After running this code, you will get the below line chart.
Now let us combine the above plots and try to draw come conclusions from the resulting combination line chart.
First, we will use the same code that we used to plot the line chart for the agricultural land cover indicator. We will just add in two more parameters called
legend to the seaborn
We will do the same for the Forest Cover chart as well. However, before we write the code for the forest cover line chart, we need to write a code that will help combine these two charts. That line of code is -
ax2 = ax.twinx()
twinx() function is a function in the axes module of matplotlib library. It is used to create a twin y-axis that will share the x-axis with the original y-axis. This new y-axis will be on the right side of the chart.
So, in our visualization, the left side y-axis is for the Agricultural land cover while the right side y-axis is for the forest cover line chart.
# Line Chart For Agricultural Land Cover fig, ax = plt.subplots(figsize=(12,6)) lineplot = sns.lineplot(x=agri_land['Year'], y=agri_land['Value'], data=agri_land, label = 'Agricultural land cover', legend=False) sns.despine() plt.ylabel('% of land area') plt.title('Agricultural land cover trend in Brazil', pad=20); # Line Chart For Forest Cover ax2 = ax.twinx() lineplot2 = sns.lineplot(x=forest_land['Year'], y=forest_land['Value'], ax=ax2, color="r", label ='Forest Cover', legend=False) sns.despine(right=False) plt.ylabel('% of land area') ax.figure.legend();
With the increase in agricultural land, there was a decrease in the forest covered land. However, this visualization isn't good! As you can see, the forest cover indicator does not have data before the 1990s. We need to change this visualization so we see the trends from 1990 to 2016.
This time, however, let us write a function that lets you display a single line chart or a combined line chart - according to the parameters you pass.
Let us write the below
quick_line_plot() function. It takes in 4 arguments -
y_label1 which are the compulsory parameters and
y_label2 which are optional.
def quick_line_plot(df1, title, y_label1, df2=None, y_label2=None): """ df1: Dataframe 1 y_label1: Y axis label for the plot of dataframe 1 df2: Dataframe 2 (optional) y_label2: Y axis label for the plot of dataframe 2 (optional) """ df1 = df1.sort_values(by='Year') year_list = df1.Year.unique() year_max = year_list[-1] year_min = year_list x_tick_list = list(range(year_min, year_max, 2)) Label1 = df1['Indicator Name'] fig, ax = plt.subplots(figsize=(12,6)) lineplot = sns.lineplot(x=df1['Year'], y=df1['Value'], data=df1, label = Label1, legend=False) lineplot.set(xlim=(year_min-1, year_max+1)) plt.xticks(x_tick_list, rotation =45) # Rotate the x-axis labels sns.despine() plt.ylabel(y_label1) plt.title(title, pad=20) if df2 is not None: ax2 = ax.twinx() Label2 = df2['Indicator Name'] lineplot2 = sns.lineplot(x=df2['Year'], y=df2['Value'], ax=ax2, color="r", label =Label2, legend=False) sns.despine(right=False) plt.ylabel(y_label2) ax.figure.legend()
We sort the first dataframe by the values in the
Year column. Next, we get the maximum and minimum year values and create a list of years using the range function with the step value as 2. This will help set our x-axis scale of the graph as 1 unit = 2 years.
Then we plot the line chart for the first dataframe.
Now, if and only if the second dataframe has been passed into the arguments of the function, we will proceed with turning the above line chart into a combination line chart.
Let us see this function in action. Let us plot the 'Agricultural land (% of land area)' indicator line chart. You'll see that writing this function helped us avoid repeating the code.
quick_line_plot(agri_land, 'Agricultural land cover trend in Brazil', '% of land area')
Similarly, let us use the
quick_line_plot() function and plot a line chart for the 'Forest area (sq. km)' indicator.
quick_line_plot(forest_land, 'Forest area trend in Brazil', 'sq. km')
Finally, let us make the correct combo line chart of Agricultural land and Forest cover using the
quick_line_plot(agri_land[agri_land.Year >= 1990], 'Agricultural and Forest land trend in Brazil', '% of land area', forest_land, 'sq. km')
Much better! As you can see, we can conclude that there has been a decline in forest cover with the rise in agricultural land in Brazil through this visualization.
After looking up this topic on Google, I found that this was indeed true. Here are some links I found about the topic -
Here is a link to the code for the above tutorial.
Until next time! Have a good day! :)