Data Visualization is a powerful way of presenting data in a readily acceptable and understandable way.
Several tools are used for data visualization including Tableau, PowerBI, and D3.js. Additionally, Python offers several plotting and graphing libraries such as Plotly, Matplotlib and Seaborn.
In this article, we will focus on Matplotlib.
Matplotlib is a plotting library used in creating interactive,static and animated visualizations.
The library is popular because:
- it is fast,powerful and efficient
- it is open source
- it works well with various OS.
To use Matplotlib:
The pyplot module is used for plotting therefore we import it as:
import matplotlib.pyplot as plt
Figure: The area that contains all the plots, titles and elements.
Axes: They delimit the chart area
In this section we will look at some common methods used for plotting.
Lets us examine the above code closely.
First we import Numpy to work with numpy arrays, and import pyplot for plotting.
The plot() function takes two parameters; the first specifies values in the x-axis(horizontal) and the second specifies values in the y-axis(vertical).
The plot() function draw points on the diagram and the default is a line from point to point.
The show() function displays all open figures.
It starts an event loop that looks for all currently active figure objects, and opens one or more interactive windows that display the figures.
For a more descriptive figure we can include a title using the title() function.
To label the x axis we use xlabel() and ylabel() to label the y axis.
So far we have a basic descriptive plot, but for more effective data visualization we can add some formatting to increase the appeal of our plots.
Let us explore some pyplot formatting.
- Consecutive Plots
You can plot consecutive plots as follows:
import matplotlib.pyplot as plt import numpy as np import pandas as pd data = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Assignment_Solutions/master/Standard%20Metropolitan%20Areas%20Data%20-%20train_data%20-%20data.csv") plt.plot(data.physicians, data.income,color = 'cyan') plt.show() plt.plot(data.work_force, data.income,color = 'red') plt.show()
However, we can improve this presentation to be more informative by plotting the two plots on the same figure. You can make better comparisons of different data points.
The legend() function is used to add a legend that describes elements in the graph.
The plot() function takes additional parameters that enable use to include more description of how the graphs should appear.
We use color to specify the color, label to provide a label to be used by the legend() function.
Other parameters include marker to specify how markers should be displayed and the linestyle indicating the linestyle to be used in the plot.
Sometimes you might need to resize the figure. The figsize argument is used for resizing.
plt.figure(figsize= (widthpixels, heightpixels))
We can also plot multiple plots in one figure using the subplot() function.
The arguments specify the rows, columns, and index of the plot respectively.
You're all set with the basics of matplotlib. We will explore matplotlib further in this post.