Python is great for data exploration and data analysis and it’s all thanks to the support of amazing libraries like numpy, pandas, matplotlib, and many others. During our data exploration and data analysis phase it’s very important to understand the data we are dealing with, and for that visual representations of our data can be extremely important.
It is common for us to work on these projects using Jupyter notebooks because they are great, fast, simple and they allow us to interact and play with our data. However there are limitations to what we can do, normally when we work with charts we use libraries like matplotlib, or seaborn, but those libraries render static images of our charts and graphs. But many things get lost in the details, and thus we need to fine-tune our charts just to explore sections of our data. Wouldn’t it be great if we could just interact with our charts by zooming in, adding contextual information to our data points like hover interactions? Here is where Plotly can help us.
Plotly is a python library that makes interactive, publication-quality graphs like line plots, scatter plots, area plots, bar charts, error bars, box plots, histograms, heatmaps, subplots, and much much more.
But we talked enough, let’s start building some charts…
Installing Dependencies
Before we build anything, let’s install dependencies. I like to use pipenv
but the same applies to anaconda or other package managers.
Here is the list of dependencies we need
- jupyter: Web application that allows you to create and share documents that contain live code, equations…. you know it!
- pandas: Very powerful library for data analysis in general and we will use it in our project to handle our data
- numpy: Scientific computing for Python, used in our project for math and generating random numbers
- seaborn: Statistical data visualization based on matplotlib, we will be using it to load some sample data that comes with the library
- cufflinks: Allows plotly to work with pandas
- plotly: Interactive charting library
Here are the commands to install them:
pipenv install jupyter
pipenv install plotly cufflinks pandas seaborn numpy
Getting Started
To get started we need to start our jupyter notebook and create a new document:
pipenv run jupyter notebook
Once we are there we can start adding some code. Since this article is not a tutorial on Jupyter Notebooks, I’ll just focus on the code and not on how to use the document.
Let’s start importing the libraries:
import pandas as pd
import numpy as np
import seaborn as sns
import cufflinks as cf
Plotly with the help of other libraries can render the plots in different contexts, for example on a jupyter notebook, online at the plotly dashboard, etc. By default, the library works with the offline mode, which is what we want. However, we also need to tell cufflinks that we will be using the offline mode for the charts. This setting can be done programmatically by adding the following cell to our notebook:
cf.go_offline()
Now we are ready to get some data and start plotting.
Generating Random Data
I don’t want to focus so much on how to load or retrieve data, so for that reason, we will simply generate random data for the charts, in a new cell we can use pandas and numpy to build a 3d matrix:
df = pd.DataFrame(np.random.randn(300, 3), columns = ["X", "Y", "Z"])
df.head()
Awesome, using numpy we can generate our random numbers and we can load them into a pandas DataFrame object. Let’s see what our data looks like:
df.head()
and we get:
X Y Z
0 0.176117 1.221648 1.201206
1 1.931615 -2.303667 1.914741
2 1.213322 -0.434855 -0.639277
3 0.763220 0.118211 -0.838034
4 0.245442 0.697897 1.169540
That’s great! time to plot some charts.
Our first plots
A convenient way to plot DataFrames is by using the method iplot available on Series and DataFrames, courtesy of cufflinks. Let’s start with all the defaults:
df.iplot()
At simple sight, it looks like any other chart, but if you hover with your mouse over the chart you will start seeing some magic. A toolbar appears on hover on the top right of the screen that allows you to zoom, pan, and other things. The chart also allows you to zoom in by drawing an area on the chart or to simply see a tooltip on each data point with additional information like the value.
Our chart above is certainly better than a static chart, however is still not great. Let’s try to render the same chart using a scatter plot.
df.iplot(mode = "markers")
Not terrible, but not great, the dots are too big, let’s resize them:
df.iplot(mode = "markers", size = 5)
Much better! next, let’s try something different.
Bar Charts
Let’s forget our randomly generated dataset for a minute, and let’s load a popular dataset from the seaborn library to render some other chart types.
titanic = sns.load_dataset("titanic")
titanic.head()
The dataset we will be working on is called “titanic”, and contains information about what happened to the people who were traveling on the titanic that tragic day.
One special variable in this dataset is the survived
variable, which contains boolean information, 0 for those who died, and 1 for those who survived the accident. Let’s build a bar chart to see how may man and woman survived:
titanic.iplot(kind = "bar", x = "sex", y = "survived")
The trend can be easily seen, however, if you just share this chart it’s impossible to know what we are talking about as it has no legends, nor titles. So let’s fix that:
titanic.iplot(kind = "bar", x = "sex", y = "survived", title = "Survivors", xTitle = "Sex", yTitle = "Number of survived")
That’s now much better!
But what if we want to draw a horizontal bar plot? Easy enough:
titanic.iplot(kind = "barh", x = "sex", y = "survived")
Great! Let’s explore some more functionality
Themes
Our charts are so far looking great, but perhaps we want to use a different color schema for our charts. Luckily enough, we have a set of themes that we can use to render our plots. Let’s list them and switch to another one.
Listing themes:
cf.getThemes()
It should output something as follows:
['ggplot', 'pearl', 'solar', 'space', 'white', 'polar', 'henanigans']
We can switch the theme for all future charts by simply adding:
cf.set_config_file(theme="solar")
And now if we render our bar chart again we get something like:
titanic.iplot(kind = "bar", x = "sex", y = "survived")
Dark mode, one of my favorites, but please check them out and let me know which one is your favorite.
Surface Charts
So far we rendered amazing 2d charts, but plotly also supports 3d charts. Let’s build some 3d charts to have some fun.The next plot that we will make it the 3D Surface plot and for that, we need to create some data using pandas as you see in the following:
df = pd.DataFrame({"A": [100, 200, 300, 200, 100], "B": [100, 200, 300, 200, 100], "C": [100, 200, 300, 200, 100]})
df.head()
You should get something like:
A B C
0 100 100 100
1 200 200 200
2 300 300 300
3 200 200 200
4 100 100 100
Now let’s throw this on a 3d chart using the “surface” kind.
df.iplot(kind = "surface")
Looks amazing! and colorful, let’s change the color scale to make it more visually appealing:
df.iplot(kind = "surface", colorscale = "rdylbu")
Beautiful! But that’s not it, have you tried interacting with the chart in your notebook? You can even rotate it!
Conclusion
Plotly is a great chart alternative for your data exploration and analysis. As seen it provides interactive dashboards that can help you identify better your outliers and get a better understanding of your data by navigating through it.I probably won’t use plotly for every single dataset, but it’s a very interesting library that we should know about.
Thanks for reading!
If you like the story, please don't forget to subscribe to our free newsletter so we can stay connected: https://livecodestream.dev/subscribe
Top comments (0)