Hello everyone!
Instagram's been cracking down on web scraping and thus, instascrape
's v1.x.x releases are starting to feel dated (despite being less than 4 months old). I've officially started working on what will become instascrape
2.0.0 that will be released some point in the near future.
If you've been experiencing issues with the library, you are not alone and I'm working as fast as I can to get us back to business! With these updates, I'm also going to push forward with a wave of new docs, blog posts, reference material, and features.
Where I'm at
Prior to this week, I have been able to roll with Instagram's little changes to their backend through minor and patch releases.
Unfortunately, their most recent change was significantly harder to figure out. Luckily, 12+ hours and a dozen or so coffees later, I have determined what needs to be done and am implementing it in the code as we speak! 😄
Sneak peak?
Here are some of the changes and features I am anticipating:
- MASSIVE overhaul/refactor of
instascrape
's backend implementation (you won't really notice, don't worry) - dedicated session and cookie handling
- official support for
selenium
(webdriver batteries will not be included, just supported) - possible login capabilities (no guarantees)
- significantly more tools and features outside of the scrapers
- likely shift away from inplace data modification for stronger method chaining and encapsulation (the only planned breaking change as of now)
Will there be breaking changes?
I'm going to keep the API as consistent with v1.x.x as I can and the changes are not going to be off the wall. You're going to see 99% more new features than you'll see changed features.
The only thing I recommend you to keep an eye on is the shift away from inplace data modification as this could result in code such as profile.scrape()
needing to be replaced with profile = profile.scrape()
.
When to expect the update? ⌚
I plan on releasing 2.0.0 sometime before February so you shouldn't have to wait too long. I am in the midst of job hunting and doing some freelance work so I'm working on the lib whenever I get a chance but it likely won't be for at least another week or two.
Thanks for reading!
If you stuck with me this far, thanks so much for reading! Follow me on Twitter @ChrisGreening as I will likely be tweeting there with small updates.
Additionally, you can check in on the progress (or even contribute) on the major-version-2 branch of the repo on GitHub.
Cheers,
Chris
chris-greening / instascrape
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
instascrape: powerful Instagram data scraping toolkit
Note: This module is no longer actively maintained.
DISCLAIMER:
Instagram has gotten increasingly strict with scraping and using this library can result in getting flagged for botting AND POSSIBLE DISABLING OF YOUR INSTAGRAM ACCOUNT. This is a research project and I am not responsible for how you use it. Independently, the library is designed to be responsible and respectful and it is up to you to decide what you do with it. I don't claim any responsibility if your Instagram account is affected by how you use this library.
What is it?
instascrape is a lightweight Python package that provides an expressive and flexible API for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.
Top comments (0)