DEV Community

Chris Greening

Posted on Nov 12, 2020

Compare major tech Instagram page's with instascrape

#python #showdev #datascience #contributorswanted

In this blog post, I'm going to compare some of the largest tech company's Instagram page's using my open source Python library instascrape! We'll be exploring their respective engagements, followers, amount of posts, etc. 🙌

chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

instascrape: powerful Instagram data scraping toolkit

Note: This module is no longer actively maintained.

DISCLAIMER:

Instagram has gotten increasingly strict with scraping and using this library can result in getting flagged for botting AND POSSIBLE DISABLING OF YOUR INSTAGRAM ACCOUNT. This is a research project and I am not responsible for how you use it. Independently, the library is designed to be responsible and respectful and it is up to you to decide what you do with it. I don't claim any responsibility if your Instagram account is affected by how you use this library.

What is it?

instascrape is a lightweight Python package that provides an expressive and flexible API for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.

Key features

…

View on GitHub

The companies we'll be comparing for this exercise are

Scraping the data

First, let's start by getting a list of their usernames:

companies = ["google", "apple", "ibm", "facebook", "microsoft", "adobe", "oracle"]

Now, scraping our data is as easy as

from instascrape import Profile 
profiles = [Profile(username) for username in companies]
for prof in profiles: 
    prof.scrape()

And that's it! We just scraped 364 data points from 7 profiles with just a few lines of code, let's use the to_dict method to get a list of dict's that can be passed into a pandas.DataFrame for expressive and powerful data analysis.

import pandas as pd 
data = [prof.to_dict() for prof in profiles]
df = pd.DataFrame(data)

Exploring our data

First, let's start by comparing how many followers each page has using a matplotlib bar plot:

import matplotlib.pyplot as plt 
plt.style.use("seaborn-darkgrid")
plt.bar(df["username"], df["followers"])

We can immediately see that Apple clearly has the most followers and surprisingly, Facebook doesn't have as many as one might expect.

Now let's see who has the most amount of posts:

import matplotlib.pyplot as plt 
plt.style.use("seaborn-darkgrid")
plt.bar(df["username"], df["posts"])

Finally, we're going to examine each page's engagement as a function of time and see how the different pages are doing

(NOTE: some of the specifics in the code are skipped so we can focus on what's important; additionally Apple will not be pictured as their data is significantly larger)

for prof in profiles:
    posts = prof.get_recent_posts()     #gets the 12 most recent posts
    posts_data = [post.to_dict() for post in posts]
    post_df = pd.DataFrame(posts_data)
    plt.plot(post_df.upload_date, post_df.likes, label=prof.username)

Some interesting things we can see right off the bat are:

Oracle barely gets any likes
Surprisingly neither does Facebook
Adobe, Google, and Microsoft post relatively frequently
IBM hasn't posted in almost two weeks
Microsoft gets the most likes on average on their posts

Conclusion

And that's pretty much it! This is just a small taste of what instascrape can accomplish and it's up to you with how you use it so get out there and start exploring that data!

If you like what you read, check out some of my other posts 😄

Scraping 25,000 data points from Joe Biden's Instagram using instascrape

Chris Greening ・ Nov 5 '20

#showdev #python #datascience #contributorswanted

Downloading recent Instagram photos using instascrape and Python

Chris Greening ・ Oct 26 '20

#python #webscraping #showdev #contributorswanted

Also, check out the official repository and drop it a star ⭐ or contribute!

chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

instascrape: powerful Instagram data scraping toolkit

Note: This module is no longer actively maintained.

DISCLAIMER:

What is it?

Key features

…

View on GitHub

Top comments (3)

Victor • Jun 11

Hello. Great post. I've been reading all your posts related to instascrape. But something is troubling me. For some reason de created DataFrame end ups being completely empty. And that is the case with everything, followers = nan, username = nan, following = nan.

I am currently running the code in google collab. I made sure to (in collab):

apt update
apt upgrade
install selenium
install chromium-chromedriver
install webdriver-manager (as suggested in another tutorial I found)
install google-chrome
Use the headers with user-agent and cookie
Instantiate the scraper objects

Any suggestions?

gravesli • Nov 13 '20

back you.

adi-prakoso • Mar 24 '21

when i scraped google instagram profile i got 'username'=nan. please clarify, thank you

DEV Community

Compare major tech Instagram page's with instascrape

chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

instascrape: powerful Instagram data scraping toolkit

Note: This module is no longer actively maintained.

DISCLAIMER:

What is it?

Key features

Scraping the data

Exploring our data

Conclusion

Scraping 25,000 data points from Joe Biden's Instagram using instascrape

Chris Greening ・ Nov 5 '20

Downloading recent Instagram photos using instascrape and Python

Chris Greening ・ Oct 26 '20

chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

instascrape: powerful Instagram data scraping toolkit

Note: This module is no longer actively maintained.

DISCLAIMER:

What is it?

Key features

Top comments (3)

Read next

Weight Decay's Critical Role: Theoretical Insights for Better Deep Learning Generalization

Generate Lifelike 3D Avatars with URAvatar's Neural Rendering Technology

Multimodal AI Explained: Why It’s Transforming the Future of Technology

Data Science Simplified: Tips for Aspiring Data Scientists in 2025