DEV Community

Cover image for Compare major tech Instagram page's with instascrape
Chris Greening
Chris Greening

Posted on

Compare major tech Instagram page's with instascrape

In this blog post, I'm going to compare some of the largest tech company's Instagram page's using my open source Python library instascrape! We'll be exploring their respective engagements, followers, amount of posts, etc. 🙌

GitHub logo chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

instascrape: powerful Instagram data scraping toolkit

Note: This module is no longer actively maintained.

DISCLAIMER:

Instagram has gotten increasingly strict with scraping and using this library can result in getting flagged for botting AND POSSIBLE DISABLING OF YOUR INSTAGRAM ACCOUNT. This is a research project and I am not responsible for how you use it. Independently, the library is designed to be responsible and respectful and it is up to you to decide what you do with it. I don't claim any responsibility if your Instagram account is affected by how you use this library.

Version Downloads Release License

Activity Dependencies Issues

What is it?

instascrape is a lightweight Python package that provides an expressive and flexible API for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.

Key features

The companies we'll be comparing for this exercise are

Scraping the data

First, let's start by getting a list of their usernames:

companies = ["google", "apple", "ibm", "facebook", "microsoft", "adobe", "oracle"]
Enter fullscreen mode Exit fullscreen mode

Now, scraping our data is as easy as

from instascrape import Profile 
profiles = [Profile(username) for username in companies]
for prof in profiles: 
    prof.scrape()
Enter fullscreen mode Exit fullscreen mode

And that's it! We just scraped 364 data points from 7 profiles with just a few lines of code, let's use the to_dict method to get a list of dict's that can be passed into a pandas.DataFrame for expressive and powerful data analysis.

import pandas as pd 
data = [prof.to_dict() for prof in profiles]
df = pd.DataFrame(data)
Enter fullscreen mode Exit fullscreen mode

Exploring our data

First, let's start by comparing how many followers each page has using a matplotlib bar plot:

import matplotlib.pyplot as plt 
plt.style.use("seaborn-darkgrid")
plt.bar(df["username"], df["followers"]) 
Enter fullscreen mode Exit fullscreen mode

Alt Text

We can immediately see that Apple clearly has the most followers and surprisingly, Facebook doesn't have as many as one might expect.

Now let's see who has the most amount of posts:

import matplotlib.pyplot as plt 
plt.style.use("seaborn-darkgrid")
plt.bar(df["username"], df["posts"]) 
Enter fullscreen mode Exit fullscreen mode

Alt Text

Finally, we're going to examine each page's engagement as a function of time and see how the different pages are doing

(NOTE: some of the specifics in the code are skipped so we can focus on what's important; additionally Apple will not be pictured as their data is significantly larger)

for prof in profiles:
    posts = prof.get_recent_posts()     #gets the 12 most recent posts
    posts_data = [post.to_dict() for post in posts]
    post_df = pd.DataFrame(posts_data)
    plt.plot(post_df.upload_date, post_df.likes, label=prof.username)
Enter fullscreen mode Exit fullscreen mode

Alt Text

Some interesting things we can see right off the bat are:

  • Oracle barely gets any likes
  • Surprisingly neither does Facebook
  • Adobe, Google, and Microsoft post relatively frequently
  • IBM hasn't posted in almost two weeks
  • Microsoft gets the most likes on average on their posts

Conclusion

And that's pretty much it! This is just a small taste of what instascrape can accomplish and it's up to you with how you use it so get out there and start exploring that data!

If you like what you read, check out some of my other posts 😄

Also, check out the official repository and drop it a star ⭐ or contribute!

GitHub logo chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

instascrape: powerful Instagram data scraping toolkit

Note: This module is no longer actively maintained.

DISCLAIMER:

Instagram has gotten increasingly strict with scraping and using this library can result in getting flagged for botting AND POSSIBLE DISABLING OF YOUR INSTAGRAM ACCOUNT. This is a research project and I am not responsible for how you use it. Independently, the library is designed to be responsible and respectful and it is up to you to decide what you do with it. I don't claim any responsibility if your Instagram account is affected by how you use this library.

Version Downloads Release License

Activity Dependencies Issues

What is it?

instascrape is a lightweight Python package that provides an expressive and flexible API for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.

Key features

Top comments (3)

Collapse
 
yazelkro profile image
Victor

Hello. Great post. I've been reading all your posts related to instascrape. But something is troubling me. For some reason de created DataFrame end ups being completely empty. And that is the case with everything, followers = nan, username = nan, following = nan.

I am currently running the code in google collab. I made sure to (in collab):

  1. apt update
  2. apt upgrade
  3. install selenium
  4. install chromium-chromedriver
  5. install webdriver-manager (as suggested in another tutorial I found)
  6. install google-chrome
  7. Use the headers with user-agent and cookie
  8. Instantiate the scraper objects

Any suggestions?

Collapse
 
gravesli profile image
gravesli

back you.

Collapse
 
adiprakoso profile image
adi-prakoso

when i scraped google instagram profile i got 'username'=nan. please clarify, thank you