In this post, I'm going to discuss how I used my open source Instagram scraper to scrape 25,000 data points from Joe Biden's Instagram page.
Combining selenium and instascrape, I wrote a quick script that automatically scrolled Joe Biden's Instagram page and scraped the first 500 posts, yielding us almost 25,000 data points to explore (with 49 data points per post) 🙌.
As expected, we can see steady growth and then a massive spike upwards as election day approached.
Let's take a look at comments per post now for the heck of it:
There's a ton of different things we can do now that the data is available to us and it's really up to you what you do with it. Using the
to_dict instance method, I can build a pandas.DataFrame from all of our data for easy analysis in a clean, expressive format. With a script like the following, we can get every post where Joe Biden used a hashtag.
dataframe[dataframe.hashtags.str.len() != 0]
or say we wanted every post where Joe got more than 1,000,000 likes:
dataframe[dataframe["likes"] > 1000000]
...so what are you waiting for? Get out there and start exploring Instagram data programatically!
If you're interested in reading more about instascrape, check out some of my other posts:
or better yet, come to the official repo and drop it a star and contribute ❤️
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
instascrape: powerful Instagram data scraping toolkit
What is it?
instascrape is a lightweight Python package that provides expressive and flexible tools for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.
Here are a few of the things that
instascrape does well:
- Powerful, object-oriented scraping tools for profiles, posts, hashtags, reels, and IGTV
- Scrapes HTML, BeautifulSoup, and JSON
- Download content to your computer as png, jpg, mp4, and mp3
- Dynamically retrieve HTML embed code for posts
- Expressive and consistent API for concise and elegant code
- Designed for seamless integration with Selenium, Pandas, and other industry standard tools for data collection and analysis
- Lightweight; no boilerplate or configurations necessary
- The only hard dependencies are Requests and Beautiful…