In this post, I'm going to discuss how I used my open source Instagram scraper to scrape 25,000 data points from Joe Biden's Instagram page.
Combining selenium and instascrape, I wrote a quick script that automatically scrolled Joe Biden's Instagram page and scraped the first 500 posts, yielding us almost 25,000 data points to explore (with 49 data points per post) 🙌.
Let's see what his likes per post looks like with a little matplotlib and scikit-learn magic 😏
As expected, we can see steady growth and then a massive spike upwards as election day approached.
Let's take a look at comments per post now for the heck of it:
There's a ton of different things we can do now that the data is available to us and it's really up to you what you do with it. Using the to_dict
instance method, I can build a pandas.DataFrame from all of our data for easy analysis in a clean, expressive format. With a script like the following, we can get every post where Joe Biden used a hashtag.
dataframe[dataframe.hashtags.str.len() != 0]
or say we wanted every post where Joe got more than 1,000,000 likes:
dataframe[dataframe["likes"] > 1000000]
...so what are you waiting for? Get out there and start exploring Instagram data programatically!
If you're interested in reading more about instascrape, check out some of my other posts:
Exploratory data analysis of Instagram using instascrape and Python
Chris Greening ・ Oct 22 '20
Downloading recent Instagram photos using instascrape and Python
Chris Greening ・ Oct 26 '20
or better yet, come to the official repo and drop it a star and contribute ❤️
chris-greening / instascrape
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
instascrape: powerful Instagram data scraping toolkit
Note: This module is no longer actively maintained.
DISCLAIMER:
Instagram has gotten increasingly strict with scraping and using this library can result in getting flagged for botting AND POSSIBLE DISABLING OF YOUR INSTAGRAM ACCOUNT. This is a research project and I am not responsible for how you use it. Independently, the library is designed to be responsible and respectful and it is up to you to decide what you do with it. I don't claim any responsibility if your Instagram account is affected by how you use this library.
What is it?
instascrape is a lightweight Python package that provides an expressive and flexible API for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.
Top comments (2)
This is amazing.thanks for sharing 🙂
Thanks so much Javed!! Glad you appreciated it <3