Chris Greening

Posted on Dec 11, 2020

Scraping an Instagram location tag with instascrape

#python #showdev #contributorswanted #datascience

Introducting: `instascrape.Location`

In this blog post, I am going to quickly introduce to you instascrape's newest feature: the ability to scrape an Instagram location tag!

With the release of v1.4.0, the Location scraper now provides a semantic way to gather data from an Instagram location tag.

Sample usage

from instascrape import Location 
url = 'https://www.instagram.com/explore/locations/212988663/new-york-new-york/'
new_york = Location(url)
new_york.scrape()

It's as easy as that! We've scraped the page and now have access to useful information such as

print(f"The NY location tag has {new_york.amount_of_posts:,} posts")
>>> The NY location tag has 61,202,403 posts.

print(f"NY tag geographic coordinates: ({new_york.latitude}, {new_york.longitude}")
>>> NY tag geographic coordinates: (40.7142, -74.0064)

as well as a variety of other useful attributes!

`get_recent_posts`

In addition to scraping some attributes regarding the location tag, we can also return some of the recent posts to that tag as instascrape.Post objects

recent_posts = new_york.get_recent_posts()
for post in recent_posts:
    print(post.upload_date)
>>> 2020-12-10 20:27:03
2020-12-10 20:27:01
2020-12-10 20:26:59
2020-12-10 20:26:59
2020-12-10 20:26:51
2020-12-10 20:26:48
2020-12-10 20:26:46
2020-12-10 20:26:45
2020-12-10 20:26:42
2020-12-10 20:26:40
2020-12-10 20:26:39
2020-12-10 20:26:33
2020-12-10 20:26:32
2020-12-10 20:26:31
2020-12-10 20:26:25
2020-12-10 20:26:22
2020-12-10 20:26:20
2020-12-10 20:26:18
2020-12-10 20:26:15
2020-12-10 20:26:11
2020-12-10 20:26:10
2020-12-10 20:26:09
2020-12-10 20:26:09
2020-12-10 20:26:06

If you want to read more about instascrape, check out some of my other posts

Dynamically generate embeddable Instagram HTML with instascrape

Chris Greening ・ Dec 1 '20

#python #showdev #webdev #html

Scraping 25,000 data points from Joe Biden's Instagram using instascrape

Chris Greening ・ Nov 5 '20

#showdev #python #datascience #contributorswanted

Or better yet, get involved and contribute! Drop the official repo a star, get involved in discussions, and stay in the loop by watching for updates on the website! Hope to see you there 🙌

chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

instascrape: powerful Instagram data scraping toolkit

Note: This module is no longer actively maintained.

DISCLAIMER:

Instagram has gotten increasingly strict with scraping and using this library can result in getting flagged for botting AND POSSIBLE DISABLING OF YOUR INSTAGRAM ACCOUNT. This is a research project and I am not responsible for how you use it. Independently, the library is designed to be responsible and respectful and it is up to you to decide what you do with it. I don't claim any responsibility if your Instagram account is affected by how you use this library.

What is it?

instascrape is a lightweight Python package that provides an expressive and flexible API for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.

Key features

…

View on GitHub

Top comments (8)

ImTheDeveloper • Dec 11 '20

Ive been following your posts closely in regards to instascrape, a great little library! Just wondered if you have hit any issues yet with instagram updating their layout / adding in any new blocking mechanisms to your scraper since working on it so far? I'm kind of thinking along the lines of how long is the shelf life for a scraper before some new breaking change comes along (ideally they want to push users into their API)

Chris Greening • Dec 11 '20 • Edited

Funny that you mention it, today was the first day after about two months of the lib's existence that I had to fix something because of a change on Instagram's end lol, I kept getting hit with 429 status codes on every request I made. I kind of figured something like this was going to happen eventually because I wasn't passing any header info with the requests; I quickly added support for passing default/custom header info though and now it's back up and running like a charm

One of the driving factors in design choice since day one has been to account for a changing Instagram API as well as the tightening of restrictions that Instagram has been trending towards. I'm hoping I'll be able to roll with the punches as they come and continue to float under their radar lol. I deliberately excluded selenium and any sort of interaction with Instagram content to avoid their wrath as much as possible so we'll see how it goes 😅

Thanks for following and asking! With the library in a comfortably stable place and no major internal design changes in the near future, I'm ready to go back and fix up some of the stuff I was kind of neglecting (i.e. missing headers, fine tuning with arguments, etc.)

ImTheDeveloper • Dec 11 '20

Nice to see it wasn't anything catastrophic 👍 I'll be likely giving this a go to monitor some insta accounts for new posts and publishing them via my telegram bot into a chat. I've been considering running through lumintai.io as I do with twitter and YouTube which has served well to proxy from multiple locations so the traffic on a single IP doesn't stack up I'll get a tutorial up on Dev if it works out 👍

Chris Greening • Dec 11 '20

Awesome! Would love to see that tutorial, I'll keep an eye out. I wrote a script a couple months ago that rotates free anonymous proxies but kept getting hit with 403's, probably because the IP's are blacklisted since everyone else is using them lol. Haven't done too much more research into proxies since I haven't really needed it yet but I plan future versions of instascrape to have support for it; it's definitely a vital tool for any large scale scraping

villival • Dec 11 '20

hey thanks for the posts i would like to know what's all about helpers module ..

i was trying to run a script ..system throwed a error
Helper module not found

Chris Greening • Dec 11 '20 • Edited

Hey there, looks like you pip installed instascrape instead of insta-scrape from PyPI. My package has the hyphen; check out the installation section of the repo or the official PyPI page for more details

Thanks so much for your interest in the library! Let me know if you have any more questions

villival • Dec 12 '20

thanks for the valuable reply :)

Bilal • Jan 7 '22

Great work! it helped me a lot in a project I'm doing the only problem is when i return the upload date it does not upload in right format it returns a list of integers

Introducting: instascrape.Location