In my most recent blog post, I discussed how I scraped 10,000 data points from Donald Trump's Instagram page using my open source Python library instascrape.
Since then, I decided to release a Jupyter Notebook tutorial that shows and explains the code step-by-step as well as some additional new analyses.
I dumped all 10,000+ data points into a .csv so that you can load them right into a
pandas.DataFrame and start exploring the data however you like!
There are a little over 200 rows representing Trump's 200 most recent Instagram posts.
Below are the featured visualizations that I perform in detail in the Jupyter Notebook.
This plot visualizes upload date vs. likes on each of Donald Trump's recent 200+ posts. I used a polynomial regression to fit and visualize the underlying trend in the scatter plot.
I also included a vertical line to represent Election Day (November 3rd) to emphasize what post frequency looks like on the campaign trail vs. afterwards.
This visualization is the only one that is not explicitly included in the notebook but is instead left as an exercise to the reader.
The implementation is identical to the likes per post visualization except this time with the amount of comments per post.
This plot compares the amount of comments vs. the amount of likes as a stacked bar plot with a logarithmic y-scale.
I chose a logarithmic y-scale because the amount of comments is significantly less than that of the likes so I scaled it to make it equally as visible. Otherwise, the amount of comments would be a useless blue band on the bottom of the plot that wouldn't tell us anything useful!
Quickly filtering our dataset, we're able to get all posts that are videos and examine how many views each one got.
Similar to the likes vs. comments per post, this stacked bar plot compares the amount of likes to the amount of views each video got.
I once again chose a logarithmic y-scale to bring the significantly smaller amount of likes into greater visibility.
This bar plot represents how many times Trump uses each unique hashtag. We can see he doesn't have that large a selection and has tagged #Repost, #MAGA, and #VOTE the most (in that order).
Similar to the Hashtag frequency analysis, this bar plot shows the location tags that Trump has used the most.
Therefore, I hope this notebook can be a useful resource that not only shows how you can analyze
instascrape data but how you can perform some interesting visualizations in your day-to-day data science activities!
If you liked this post, check out some of my other posts
and drop the official repository a star ⭐!
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
instascrape: powerful Instagram data scraping toolkit
What is it?
instascrape is a lightweight Python package that provides an expressive and flexible API for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.
Here are a few of the things that
instascrape does well:
- Powerful, object-oriented scraping tools for profiles, posts, hashtags, reels, and IGTV
- Scrapes HTML, BeautifulSoup, and JSON
- Download content to your computer as png, jpg, mp4, and mp3
- Dynamically retrieve HTML embed code for posts
- Expressive and consistent API for concise and elegant code
- Designed for seamless integration with Selenium, Pandas, and other industry standard tools for data collection and analysis
- Lightweight; no boilerplate or configurations necessary
- The only hard dependencies are Requests and…