DEV Community

Cover image for Animating 60 years of US energy data with 10 lines of Python
Chris Greening
Chris Greening

Posted on • Updated on

Animating 60 years of US energy data with 10 lines of Python

In this step-by-step tutorial, we're going to cover how to animate a meaningful visualization from a real world dataset with almost 700,000 data points.

If you would rather clone and follow this tutorial as a Jupyter Notebook, you can do so at this link


Chris Greening - Software Developer

Hey! My name's Chris Greening and I'm a software developer from the New York metro area with a diverse range of engineering experience - beam me a message and let's build something great!

favicon christophergreening.com

Otherwise, we'll be animating ~60 years of United States energy consumption data in only 10 lines of Python!

To prove it to you, here is the code from start to finish:

import pandas as pd 
import plotly.express as px 

# Importing the dataset
df = pd.read_csv("https://www.eia.gov/state/seds/sep_use/total/csv/use_all_btu.csv")

# Filter out aggregate United States and Washington DC rows
df = df[~df.isin(["US", "DC"])]

# Filter rows that contain total consumption per energy source
total_df = df[df["MSN"].str.match("[A-Z]{2}TC[A-Z]")]

# Melt from wide to long format 
total_df = total_df.melt(id_vars=["Data_Status", "State", "MSN"], var_name="Year", value_name="BTU")

# Sum energy consumption per state per year 
summed_df = total_df.groupby(["State", "Year"], as_index=False).sum()

# Animate the bar plot
fig = px.bar(summed_df, x="State", y="BTU", animation_frame="Year", range_y=(0, summed_df["BTU"].max()), color="State", title="United States total energy consumption (BTU)")
fig.update_xaxes(categoryorder="total ascending")
fig.show()
Enter fullscreen mode Exit fullscreen mode

The .csv we will be working with is publicly available on the US Energy Information Administration government website.

Prerequisite packages:

And now... time to code!

Importing the dataset

Importing our dataset is relatively straightforward; we're able to pass the URL of a .csv to pandas.read_csv and it will import it into a pandas.DataFrame.

df = pd.read_csv("https://www.eia.gov/state/seds/sep_use/total/csv/use_all_btu.csv")
Enter fullscreen mode Exit fullscreen mode

Alt Text

Removing aggregate US and Washington DC rows

In this case, our dataset contains rows for both aggregate United States (US) and Washington DC (DC) data.

Since we are only concerned with energy consumption per state, we're going to filter the US and DC rows out.

df = df[~df.isin(["US", "DC"])]
Enter fullscreen mode Exit fullscreen mode

Understanding the MSN column

Do NOT worry if you don't fully understand this next section! It has no bearing on the code, we're just going to develop some important domain knowledge before moving forward.

Looking through documentation provided on the website, we can see that the "MSN" column stands for Mnemonic Series Names. These are codes that contain information on the type of energy source, the sector, and the unit.

Alt Text

Understanding MSN by example

In the context of our data, row 0's MSN value is ABICB.

Let's break it down:

  • AB: Aviation gasoline blending components
  • IC: Industrial sector
  • B: British thermal units (BTU)

Thus, that row corresponds to Aviation gasoline blending components consumed by the industrial sector in british thermal units (BTU).

When the 3rd and 4th characters of an MSN are TC, this means the row corresponds to the total energy consumption across all sectors (i.e. residential, industrial, etc.) for that resource.

Looking back at the Aviation gasoline blending components, an MSN of ABTCB would thus be

  • AB: Aviation gasoline blending components
  • TC: Total of all sectors
  • B: British thermal units (BTU)

Filtering rows that contain total consumption per energy source

To get all rows that have TC as the 3rd and 4th characters (and thus the total), we can use a regular expression with the Series.str.match method.

The pattern we will use is "[A-Z]{2}TC[A-Z]".

Let's break it down:

  • [A-Z]{2}: The 1st and 2nd letter can be any uppercase letters
  • TC: The 3rd and 4th letters must be TC
  • [A-Z]: The 5th character can be any uppercase letter
total_df = df[df["MSN"].str.match("[A-Z]{2}TC[A-Z]")]
Enter fullscreen mode Exit fullscreen mode

Alt Text

Melting years into a Year column

Now that we have the rows we want to visualize, we're going to unpivot our DataFrame from wide to long format. This will massage the year columns into rows as two new columns: "Year" and "BTU".

This allows us to select and filter our data much easier now as well as pass it through to plotly.express which is expecting a long format DataFrame.

total_df = total_df.melt(id_vars=["Data_Status", "State", "MSN"], var_name="Year", value_name="BTU")
Enter fullscreen mode Exit fullscreen mode

Alt Text

Summing all energy sources together

Now that we have the total consumption per energy source, state, and year, we can sum them all together grouped by each state and each year.

This will leave us with the total energy consumption per state per year which is what we're looking to visualize.

summed_df = total_df.groupby(["State", "Year"], as_index=False).sum()
Enter fullscreen mode Exit fullscreen mode

Alt Text

Creating the animated bar plot

Our data is now ready to be visualized! plotly.express.bar has an animation_frame argument which allows us to pass what column to animate our data on.

For this example, we will be animating one frame per "Year" of data.

fig = px.bar(
        summed_df, 
        x="State", 
        y="BTU", 
        color="State",
        animation_frame="Year", 
        title="United States total energy consumption (BTU)",
        range_y=(
            0, 
            summed_df["BTU"].max()
        ), 
).update_xaxes(categoryorder="total ascending")

fig.show()
Enter fullscreen mode Exit fullscreen mode

Alt Text

Additional exercises

Great work! Now that we have this data, there is a ton of other insights you can find. I recommend messing around with different MSN codes, groupings, etc. and see what you can learn!

If you have any questions, feel free to message me or reach out in the comments below! 👇

Chris Greening - Software Developer

Hey! My name's Chris Greening and I'm a software developer from the New York metro area with a diverse range of engineering experience - beam me a message and let's build something great!

favicon christophergreening.com

Top comments (4)

Collapse
 
juliecodestack profile image
JulieS

Hi Chris! I use matplotlib and seaborn for data visulization. Recently I read introduction of plotly. It can animate the figure. Thank you for your example and introduction. I'd like to learn plotly and may discuss the use of plotly with you later haha.

Collapse
 
chrisgreening profile image
Chris Greening

Anytime! Plotly is a lot of fun to use - it's been a while since I've used it but I had a lot of fun building data dashboards with it back a year or two ago

Collapse
 
shwetabh1 profile image
Shwetabh Shekhar

This is great. Thanks for sharing!

Collapse
 
chrisgreening profile image
Chris Greening

Hey, thanks so much for checking it out! 😄