In a recent post, I introduced my open source Instagram web scraper
instascrape as a lightweight means of collecting data from Instagram using Python!
For this post, I'm going to walkthrough an example using one of
instascrape's recent additions: the ability to scrape an Instagram user's recent posts! With this data, we'll be able to visualize the trend in engagement for that user and see if their page is growing or declining 🙌.
We'll be visualizing data from my Instagram page @chris_greening (shameless self promo 😉) but feel free to remove my username and replace it with your own 😬
Now let's jump right in! To start, we'll
Profile scraper and load the data from Instagram:
from instascrape import Profile chris = Profile('chris_greening') chris.scrape() recent_posts = chris.get_recent_posts()
Out of the box,
selenium or similar)
Now that we have the data, let's create a
dict's that can easily be built into a
import pandas as pd posts_data = [post.to_dict() for post in recent_posts] posts_df = pd.DataFrame(posts_data) print(posts_df[['upload_date', 'comments', 'likes']])
which gives us
upload_date comments likes 0 2020-10-16 14:39:41 8 119 1 2020-10-15 13:11:42 21 165 2 2020-10-14 12:36:21 16 150 3 2020-09-28 12:17:21 6 164 4 2020-09-27 09:27:00 14 210 5 2020-09-26 11:38:27 16 217 6 2020-09-25 10:18:28 17 227 7 2020-09-24 11:01:04 20 239 8 2020-09-17 17:49:18 15 279 9 2020-09-14 10:05:24 14 316 10 2020-09-09 10:24:17 13 244 11 2020-09-08 09:06:05 33 393
Awesome! Now we can get to visualizing our data and see how the page is doing:
import matplotlib.pyplot as plt plt.style.use('seaborn-darkgrid') # Stylistic change plt.scatter(df.upload_date, df.likes) # Plot the data plt.xlabel('Upload Date') # Write labels plt.ylabel('Likes') plt.title('@chris_greening Likes per Post') plt.show() # Show graph
And that's it! As we can see, my Instagram is in fact trending downwards, yayyyy!... 😅
If you wanted to go further, you could use libraries such as
selenium to extend
instascrape and fit regressors to dynamically loaded data for a more comprehensive visualization as shown below:
Let me know your thoughts in the comments below or even better, check out the repo on Github and contribute!
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
instascrape: powerful Instagram data scraping toolkit
What is it?
instascrape is a lightweight Python package that provides expressive and flexible tools for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.
Here are a few of the things that
instascrape does well:
- Powerful, object-oriented scraping tools for profiles, posts, hashtags, reels, and IGTV
- Scrapes HTML, BeautifulSoup, and JSON
- Download content to your computer as png, jpg, mp4, and mp3
- Dynamically retrieve HTML embed code for posts
- Expressive and consistent API for concise and elegant code
- Designed for seamless integration with Selenium, Pandas, and other industry standard tools for data collection and analysis
- Lightweight; no boilerplate or configurations necessary
- The only hard dependencies are Requests and Beautiful…