DEV Community

Cover image for The Instagram Hashtag scraper
Chris Greening
Chris Greening

Posted on

The Instagram Hashtag scraper

In this series, I have presented instascrape's Profile and Post scrapers and discussed what data points they collect. For this post, we're going to look at what the Hashtag scraper is able to scrape.

GitHub logo chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

instascrape: powerful Instagram data scraping toolkit

Version Downloads Release License

Activity Dependencies Issues Code style: black

What is it?

instascrape is a lightweight Python package that provides an expressive and flexible API for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.

Key features

Here are a few of the things that instascrape does well:

  • Powerful, object-oriented scraping tools for profiles, posts, hashtags, reels, and IGTV
  • Scrapes HTML, BeautifulSoup, and JSON
  • Download content to your computer as png, jpg, mp4, and mp3
  • Dynamically retrieve HTML embed code for posts
  • Expressive and consistent API for concise and elegant code
  • Designed for seamless integration with Selenium, Pandas, and other industry standard tools for data collection and analysis
  • Lightweight; no boilerplate or configurations necessary
  • The only hard dependencies are Requests and…

The Hashtag scraper scrapes 22 data points associated with an Instagram hashtag.

Instance attribute names have been chosen to be semantic and easy to understand.

The data points

The best way to learn is by example so we'll take a look at the #google hashtag's scraped Instagram data.

All instascrape scrapers have a to_dict method that returns all data as a dictionary so we can see everything in one shot.

from instascrape import Hashtag 
google_hashtag = Hashtag('google')
{'csrf_token': 'jfndsjklfhdasjklfhsdjklfasdhnfkjlsda',
 'viewer': None,
 'viewer_id': None,
 'country_code': 'US',
 'language_code': 'en',
 'locale': 'en_US',
 'device_id': '12345678-1234-1234-1234-123456789012',
 'browser_push_pub_key': '1245643253543556555564',
 'key_id': '87',
 'public_key': 'alskdfnkl123213123ALSKDNfjklsdfasdfndsalfasdlfkh',
 'version': '9',
 'is_dev': False,
 'rollout_hash': 'b10813bd9030',
 'bundle_variant': 'es6',
 'frontend_dev': 'c1f',
 'id': '17843843635029645',
 'name': 'google',
 'allow_following': False,
 'is_following': False,
 'is_top_media_only': False,
 'profile_pic_url': '',
 'amount_of_posts': 9350019}
Enter fullscreen mode Exit fullscreen mode

If you're interested in seeing instascrape in action, check out some of my other posts that explore practical examples:

In the next part of the series, we will be exploring what attributes are provided by the Location scraper.

Discussion (1)

villival profile image

unable to execute the code... i dont know where the output is being stored am using jupyter notebook