In this series, I'm showing the available scraped data from all scrapers provided by instascrape
. For this post, we're going to take a quick peak at what data points are scraped from an Instagram post when using the Post
scraper.
chris-greening
/
instascrape
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
instascrape: powerful Instagram data scraping toolkit
What is it?
instascrape is a lightweight Python package that provides an expressive and flexible API for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.
Key features
Here are a few of the things that instascrape
does well:
- Powerful, object-oriented scraping tools for profiles, posts, hashtags, reels, and IGTV
- Scrapes HTML, BeautifulSoup, and JSON
- Download content to your computer as png, jpg, mp4, and mp3
- Dynamically retrieve HTML embed code for posts
- Expressive and consistent API for concise and elegant code
- Designed for seamless integration with Selenium, Pandas, and other industry standard tools for data collection and analysis
- Lightweight; no boilerplate or configurations necessary
- The only hard dependencies are Requests and…
The Post
scraper scrapes 51 data points associated with an Instagram post.
Instance attribute names have been chosen to be semantic and easy to understand.
The data points
The best way to learn is by example so we'll take a look at an individual post from @google's scraped Instagram data.
All instascrape
scrapers have a to_dict
method that returns all data as a dictionary so we can see everything in one shot.
from instascrape import Post
google_post = Post('https://www.instagram.com/p/CGiWeQjl4DI/')
google_post.scrape()
google_post.to_dict()
>>>
{'csrf_token': 'UPIA1ETx3inJYcyZXJloxvfBcxbpZOHu',
'viewer': None,
'viewer_id': None,
'country_code': 'US',
'language_code': 'en',
'locale': 'en_US',
'device_id': '12345678-1234-1234-1234-123456789012',
'browser_push_pub_key': 'BIBn3E_rWTci8Xn6P9Xj3btShT85Wdtne0LtwNUyRQ5XjFNkuTq9j4MPAVLvAFhXrUU1A9UxyxBA7YIOjqDIDHI',
'key_id': '108',
'public_key': 'ae57c2c94c4afb24eeda9cd759faeeb034086d0ebe60e90123baed5e38c3304f',
'version': '10',
'is_dev': False,
'rollout_hash': 'b10813bd9030',
'bundle_variant': 'es6',
'frontend_dev': 'prod',
'id': '2423598385863295176',
'shortcode': 'CGiWeQjl4DI',
'height': 1350,
'width': 1080,
'gating_info': None,
'fact_check_overall_rating': None,
'fact_check_information': None,
'sensitivity_friction_info': None,
'media_overlay_info': None,
'media_preview': 'ACEq3DTDWKNRIUlcZLbj7jgH3961kk3qGGOf6dabdhWuOxSYqpHdeYdwO5D0PTjpnHtVsEEZHI9RVJisJiiloouI5/yAPWrlpHvjdT0ByPUE+n5VB15NXLN8Fl7Yz+VZN6GqRUjjMR2nIzwPf/61NVGhfb1BBwfb3HrV3yguZmOS33R2VfX6mgrvQ4AJx3/PH0qouys9f8yGru5DuPt+Roqvu9lopXLshS9OibYGdjhdpHuSemKio9PwqRl2OTfahjkYBH6/5/lUbXIhXHU449/enaodsIA4G4DArHh+brz9aaEyx9ob2op+0egop3FY/9k=',
'display_url': 'https://scontent-lga3-1.cdninstagram.com/v/t51.2885-15/e35/p1080x1080/122016367_1013895575740222_4221948305990255791_n.jpg?_nc_ht=scontent-lga3-1.cdninstagram.com&_nc_cat=110&_nc_ohc=FUfhUeFsllQAX_X5TZr&tp=1&oh=9ccf953089eed8e561f7b47d38eb8357&oe=6014039B',
'accessibility_caption': 'Photo by Google on October 19, 2020. Image may contain: outdoor.',
'is_video': False,
'tracking_token': 'eyJ2ZXJzaW9uIjo1LCJwYXlsb2FkIjp7ImlzX2FuYWx5dGljc190cmFja2VkIjp0cnVlLCJ1dWlkIjoiNGNkMjVjN2QxMzY3NDk0OWE5Y2VmNzA0ZWQ0YTc1NmYyNDIzNTk4Mzg1ODYzMjk1MTc2In0sInNpZ25hdHVyZSI6IiJ9',
'tagged_users': [],
'caption': 'Working from home can be ruff, but having #Dooglers around helps. 🐾 Tap the link in our bio to learn how Googlers are keeping our dog-friendly company culture going, even outside of the paw-ffice.',
'caption_is_edited': False,
'has_ranked_comments': False,
'comments': 527,
'comments_disabled': False,
'commenting_disabled_for_viewer': False,
'timestamp': 1603135463,
'likes': 147553,
'location': nan,
'viewer_has_liked': False,
'viewer_has_saved': False,
'viewer_has_saved_to_collection': False,
'viewer_in_photo_of_you': False,
'viewer_can_reshare': True,
'video_url': nan,
'has_audio': nan,
'video_view_count': nan,
'username': 'google',
'full_name': 'Google',
'upload_date': datetime.datetime(2020, 10, 19, 15, 24, 23),
'hashtags': ['Dooglers']}
And there we have it! If you're interested in seeing instascrape
in action, check out some of my other posts that explore practical examples:

Scraping 10,000 data points from Donald Trump's Instagram page with Python
Chris Greening ・ Dec 20 ・ 4 min read

Downloading recent Instagram photos using instascrape and Python
Chris Greening ・ Oct 26 ・ 2 min read
In the next part of the series, we will be exploring what attributes are provided by the Hashtag
scraper.
Discussion (1)
you are amazing. Keep posting more posts ...