DEV Community

Chris Greening

Posted on Dec 28, 2020 • Edited on Dec 31, 2020

The Instagram Post scraper

#python #showdev #contributorswanted #datascience

In this series, I'm showing the available scraped data from all scrapers provided by instascrape. For this post, we're going to take a quick peak at what data points are scraped from an Instagram post when using the Post scraper.

chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

instascrape: powerful Instagram data scraping toolkit

Note: This module is no longer actively maintained.

DISCLAIMER:

Instagram has gotten increasingly strict with scraping and using this library can result in getting flagged for botting AND POSSIBLE DISABLING OF YOUR INSTAGRAM ACCOUNT. This is a research project and I am not responsible for how you use it. Independently, the library is designed to be responsible and respectful and it is up to you to decide what you do with it. I don't claim any responsibility if your Instagram account is affected by how you use this library.

What is it?

instascrape is a lightweight Python package that provides an expressive and flexible API for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.

Key features

…

View on GitHub

The Post scraper scrapes 51 data points associated with an Instagram post.

Instance attribute names have been chosen to be semantic and easy to understand.

The data points

The best way to learn is by example so we'll take a look at an individual post from @google's scraped Instagram data.

All instascrape scrapers have a to_dict method that returns all data as a dictionary so we can see everything in one shot.

from instascrape import Post 
google_post = Post('https://www.instagram.com/p/CGiWeQjl4DI/')
google_post.scrape()
google_post.to_dict()
>>>
{'csrf_token': 'UPIA1ETx3inJYcyZXJloxvfBcxbpZOHu',
 'viewer': None,
 'viewer_id': None,
 'country_code': 'US',
 'language_code': 'en',
 'locale': 'en_US',
 'device_id': '12345678-1234-1234-1234-123456789012',
 'browser_push_pub_key': 'BIBn3E_rWTci8Xn6P9Xj3btShT85Wdtne0LtwNUyRQ5XjFNkuTq9j4MPAVLvAFhXrUU1A9UxyxBA7YIOjqDIDHI',
 'key_id': '108',
 'public_key': 'ae57c2c94c4afb24eeda9cd759faeeb034086d0ebe60e90123baed5e38c3304f',
 'version': '10',
 'is_dev': False,
 'rollout_hash': 'b10813bd9030',
 'bundle_variant': 'es6',
 'frontend_dev': 'prod',
 'id': '2423598385863295176',
 'shortcode': 'CGiWeQjl4DI',
 'height': 1350,
 'width': 1080,
 'gating_info': None,
 'fact_check_overall_rating': None,
 'fact_check_information': None,
 'sensitivity_friction_info': None,
 'media_overlay_info': None,
 'media_preview': 'ACEq3DTDWKNRIUlcZLbj7jgH3961kk3qGGOf6dabdhWuOxSYqpHdeYdwO5D0PTjpnHtVsEEZHI9RVJisJiiloouI5/yAPWrlpHvjdT0ByPUE+n5VB15NXLN8Fl7Yz+VZN6GqRUjjMR2nIzwPf/61NVGhfb1BBwfb3HrV3yguZmOS33R2VfX6mgrvQ4AJx3/PH0qouys9f8yGru5DuPt+Roqvu9lopXLshS9OibYGdjhdpHuSemKio9PwqRl2OTfahjkYBH6/5/lUbXIhXHU449/enaodsIA4G4DArHh+brz9aaEyx9ob2op+0egop3FY/9k=',
 'display_url': 'https://scontent-lga3-1.cdninstagram.com/v/t51.2885-15/e35/p1080x1080/122016367_1013895575740222_4221948305990255791_n.jpg?_nc_ht=scontent-lga3-1.cdninstagram.com&_nc_cat=110&_nc_ohc=FUfhUeFsllQAX_X5TZr&tp=1&oh=9ccf953089eed8e561f7b47d38eb8357&oe=6014039B',
 'accessibility_caption': 'Photo by Google on October 19, 2020. Image may contain: outdoor.',
 'is_video': False,
 'tracking_token': 'eyJ2ZXJzaW9uIjo1LCJwYXlsb2FkIjp7ImlzX2FuYWx5dGljc190cmFja2VkIjp0cnVlLCJ1dWlkIjoiNGNkMjVjN2QxMzY3NDk0OWE5Y2VmNzA0ZWQ0YTc1NmYyNDIzNTk4Mzg1ODYzMjk1MTc2In0sInNpZ25hdHVyZSI6IiJ9',
 'tagged_users': [],
 'caption': 'Working from home can be ruff, but having #Dooglers around helps. 🐾 Tap the link in our bio to learn how Googlers are keeping our dog-friendly company culture going, even outside of the paw-ffice.',
 'caption_is_edited': False,
 'has_ranked_comments': False,
 'comments': 527,
 'comments_disabled': False,
 'commenting_disabled_for_viewer': False,
 'timestamp': 1603135463,
 'likes': 147553,
 'location': nan,
 'viewer_has_liked': False,
 'viewer_has_saved': False,
 'viewer_has_saved_to_collection': False,
 'viewer_in_photo_of_you': False,
 'viewer_can_reshare': True,
 'video_url': nan,
 'has_audio': nan,
 'video_view_count': nan,
 'username': 'google',
 'full_name': 'Google',
 'upload_date': datetime.datetime(2020, 10, 19, 15, 24, 23),
 'hashtags': ['Dooglers']}