Instagram's been cracking down on web scraping and thus,
instascrape's v1.x.x releases are starting to feel dated (despite being less than 4 months old). I've officially started working on what will become
instascrape 2.0.0 that will be released some point in the near future.
If you've been experiencing issues with the library, you are not alone and I'm working as fast as I can to get us back to business! With these updates, I'm also going to push forward with a wave of new docs, blog posts, reference material, and features.
Prior to this week, I have been able to roll with Instagram's little changes to their backend through minor and patch releases.
Unfortunately, their most recent change was significantly harder to figure out. Luckily, 12+ hours and a dozen or so coffees later, I have determined what needs to be done and am implementing it in the code as we speak! 😄
Here are some of the changes and features I am anticipating:
- MASSIVE overhaul/refactor of
instascrape's backend implementation (you won't really notice, don't worry)
- dedicated session and cookie handling
- official support for
selenium(webdriver batteries will not be included, just supported)
- possible login capabilities (no guarantees)
- significantly more tools and features outside of the scrapers
- likely shift away from inplace data modification for stronger method chaining and encapsulation (the only planned breaking change as of now)
I'm going to keep the API as consistent with v1.x.x as I can and the changes are not going to be off the wall. You're going to see 99% more new features than you'll see changed features.
The only thing I recommend you to keep an eye on is the shift away from inplace data modification as this could result in code such as
profile.scrape() needing to be replaced with
profile = profile.scrape().
I plan on releasing 2.0.0 sometime before February so you shouldn't have to wait too long. I am in the midst of job hunting and doing some freelance work so I'm working on the lib whenever I get a chance but it likely won't be for at least another week or two.
If you stuck with me this far, thanks so much for reading! Follow me on Twitter @ChrisGreening as I will likely be tweeting there with small updates.
Additionally, you can check in on the progress (or even contribute) on the major-version-2 branch of the repo on GitHub.
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
instascrape: powerful Instagram data scraping toolkit
What is it?
instascrape is a lightweight Python package that provides an expressive and flexible API for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.
Here are a few of the things that
instascrape does well:
- Powerful, object-oriented scraping tools for profiles, posts, hashtags, reels, and IGTV
- Scrapes HTML, BeautifulSoup, and JSON
- Download content to your computer as png, jpg, mp4, and mp3
- Dynamically retrieve HTML embed code for posts
- Expressive and consistent API for concise and elegant code
- Designed for seamless integration with Selenium, Pandas, and other industry standard tools for data collection and analysis
- Lightweight; no boilerplate or configurations necessary
- The only hard dependencies are Requests and…