Introducting: instascrape.Location
In this blog post, I am going to quickly introduce to you instascrape
's newest feature: the ability to scrape an Instagram location tag!
With the release of v1.4.0, the Location scraper now provides a semantic way to gather data from an Instagram location tag.
Sample usage
from instascrape import Location
url = 'https://www.instagram.com/explore/locations/212988663/new-york-new-york/'
new_york = Location(url)
new_york.scrape()
It's as easy as that! We've scraped the page and now have access to useful information such as
print(f"The NY location tag has {new_york.amount_of_posts:,} posts")
>>> The NY location tag has 61,202,403 posts.
print(f"NY tag geographic coordinates: ({new_york.latitude}, {new_york.longitude}")
>>> NY tag geographic coordinates: (40.7142, -74.0064)
as well as a variety of other useful attributes!
get_recent_posts
In addition to scraping some attributes regarding the location tag, we can also return some of the recent posts to that tag as instascrape.Post
objects
recent_posts = new_york.get_recent_posts()
for post in recent_posts:
print(post.upload_date)
>>> 2020-12-10 20:27:03
2020-12-10 20:27:01
2020-12-10 20:26:59
2020-12-10 20:26:59
2020-12-10 20:26:51
2020-12-10 20:26:48
2020-12-10 20:26:46
2020-12-10 20:26:45
2020-12-10 20:26:42
2020-12-10 20:26:40
2020-12-10 20:26:39
2020-12-10 20:26:33
2020-12-10 20:26:32
2020-12-10 20:26:31
2020-12-10 20:26:25
2020-12-10 20:26:22
2020-12-10 20:26:20
2020-12-10 20:26:18
2020-12-10 20:26:15
2020-12-10 20:26:11
2020-12-10 20:26:10
2020-12-10 20:26:09
2020-12-10 20:26:09
2020-12-10 20:26:06
If you want to read more about instascrape
, check out some of my other posts
Dynamically generate embeddable Instagram HTML with instascrape
Chris Greening ・ Dec 1 '20
Scraping 25,000 data points from Joe Biden's Instagram using instascrape
Chris Greening ・ Nov 5 '20
Or better yet, get involved and contribute! Drop the official repo a star, get involved in discussions, and stay in the loop by watching for updates on the website! Hope to see you there 🙌
chris-greening / instascrape
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
instascrape: powerful Instagram data scraping toolkit
Note: This module is no longer actively maintained.
DISCLAIMER:
Instagram has gotten increasingly strict with scraping and using this library can result in getting flagged for botting AND POSSIBLE DISABLING OF YOUR INSTAGRAM ACCOUNT. This is a research project and I am not responsible for how you use it. Independently, the library is designed to be responsible and respectful and it is up to you to decide what you do with it. I don't claim any responsibility if your Instagram account is affected by how you use this library.
What is it?
instascrape is a lightweight Python package that provides an expressive and flexible API for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.
Top comments (8)
Ive been following your posts closely in regards to instascrape, a great little library! Just wondered if you have hit any issues yet with instagram updating their layout / adding in any new blocking mechanisms to your scraper since working on it so far? I'm kind of thinking along the lines of how long is the shelf life for a scraper before some new breaking change comes along (ideally they want to push users into their API)
Funny that you mention it, today was the first day after about two months of the lib's existence that I had to fix something because of a change on Instagram's end lol, I kept getting hit with 429 status codes on every request I made. I kind of figured something like this was going to happen eventually because I wasn't passing any header info with the requests; I quickly added support for passing default/custom header info though and now it's back up and running like a charm
One of the driving factors in design choice since day one has been to account for a changing Instagram API as well as the tightening of restrictions that Instagram has been trending towards. I'm hoping I'll be able to roll with the punches as they come and continue to float under their radar lol. I deliberately excluded
selenium
and any sort of interaction with Instagram content to avoid their wrath as much as possible so we'll see how it goes 😅Thanks for following and asking! With the library in a comfortably stable place and no major internal design changes in the near future, I'm ready to go back and fix up some of the stuff I was kind of neglecting (i.e. missing headers, fine tuning with arguments, etc.)
Nice to see it wasn't anything catastrophic 👍 I'll be likely giving this a go to monitor some insta accounts for new posts and publishing them via my telegram bot into a chat. I've been considering running through lumintai.io as I do with twitter and YouTube which has served well to proxy from multiple locations so the traffic on a single IP doesn't stack up I'll get a tutorial up on Dev if it works out 👍
Awesome! Would love to see that tutorial, I'll keep an eye out. I wrote a script a couple months ago that rotates free anonymous proxies but kept getting hit with 403's, probably because the IP's are blacklisted since everyone else is using them lol. Haven't done too much more research into proxies since I haven't really needed it yet but I plan future versions of
instascrape
to have support for it; it's definitely a vital tool for any large scale scrapinghey thanks for the posts i would like to know what's all about helpers module ..
i was trying to run a script ..system throwed a error
Helper module not found
Hey there, looks like you pip installed instascrape instead of insta-scrape from PyPI. My package has the hyphen; check out the installation section of the repo or the official PyPI page for more details
Thanks so much for your interest in the library! Let me know if you have any more questions
thanks for the valuable reply :)
Great work! it helped me a lot in a project I'm doing the only problem is when i return the upload date it does not upload in right format it returns a list of integers