DEV Community

loading...
Cover image for Visualizing Donald Trump's Instagram data with Python

Visualizing Donald Trump's Instagram data with Python

Chris Greening
Freelance Python developer | Probably programming right now | Coding, hiking, and rollerblading
・3 min read

In my most recent blog post, I discussed how I scraped 10,000 data points from Donald Trump's Instagram page using my open source Python library instascrape.

Since then, I decided to release a Jupyter Notebook tutorial that shows and explains the code step-by-step as well as some additional new analyses.

The tutorial uses pandas, scikit-learn, and matplotlib for all analyses and visualizations.

Trump's Instagram dataset

Alt Text

I dumped all 10,000+ data points into a .csv so that you can load them right into a pandas.DataFrame and start exploring the data however you like!

There are a little over 200 rows representing Trump's 200 most recent Instagram posts.

Featured visualizations

Below are the featured visualizations that I perform in detail in the Jupyter Notebook.

Likes per Post

Alt Text

This plot visualizes upload date vs. likes on each of Donald Trump's recent 200+ posts. I used a polynomial regression to fit and visualize the underlying trend in the scatter plot.

I also included a vertical line to represent Election Day (November 3rd) to emphasize what post frequency looks like on the campaign trail vs. afterwards.

Comments per Post

Alt Text

This visualization is the only one that is not explicitly included in the notebook but is instead left as an exercise to the reader.

The implementation is identical to the likes per post visualization except this time with the amount of comments per post.

Likes vs. Comments per Post

Alt Text

This plot compares the amount of comments vs. the amount of likes as a stacked bar plot with a logarithmic y-scale.

I chose a logarithmic y-scale because the amount of comments is significantly less than that of the likes so I scaled it to make it equally as visible. Otherwise, the amount of comments would be a useless blue band on the bottom of the plot that wouldn't tell us anything useful!

Views per Video

Alt Text

Quickly filtering our dataset, we're able to get all posts that are videos and examine how many views each one got.

Views vs. Likes per Video

Alt Text

Similar to the likes vs. comments per post, this stacked bar plot compares the amount of likes to the amount of views each video got.

I once again chose a logarithmic y-scale to bring the significantly smaller amount of likes into greater visibility.

Hashtag Frequency

Alt Text

This bar plot represents how many times Trump uses each unique hashtag. We can see he doesn't have that large a selection and has tagged #Repost, #MAGA, and #VOTE the most (in that order).

Location Tag Frequency

Alt Text

Similar to the Hashtag frequency analysis, this bar plot shows the location tags that Trump has used the most.

Conclusion

Therefore, I hope this notebook can be a useful resource that not only shows how you can analyze instascrape data but how you can perform some interesting visualizations in your day-to-day data science activities!

If you liked this post, check out some of my other posts


and drop the official repository a star ⭐!

GitHub logo chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

instascrape: powerful Instagram data scraping toolkit

Version Downloads Release License

Activity Dependencies Issues Code style: black

What is it?

instascrape is a lightweight Python package that provides an expressive and flexible API for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.

Key features

Here are a few of the things that instascrape does well:

  • Powerful, object-oriented scraping tools for profiles, posts, hashtags, reels, and IGTV
  • Scrapes HTML, BeautifulSoup, and JSON
  • Download content to your computer as png, jpg, mp4, and mp3
  • Dynamically retrieve HTML embed code for posts
  • Expressive and consistent API for concise and elegant code
  • Designed for seamless integration with Selenium, Pandas, and other industry standard tools for data collection and analysis
  • Lightweight; no boilerplate or configurations necessary
  • The only hard dependencies are Requests and…




Discussion (0)