In this blog post, I'm going to compare some of the largest tech company's Instagram page's using my open source Python library instascrape! We'll be exploring their respective engagements, followers, amount of posts, etc. 🙌
chris-greening / instascrape
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
instascrape: powerful Instagram data scraping toolkit
DISCLAIMER:
Instagram has gotten increasingly strict with scraping and using this library can result in getting flagged for botting AND POSSIBLE DISABLING OF YOUR INSTAGRAM ACCOUNT. This is a research project and I am not responsible for how you use it. Independently, the library is designed to be responsible and respectful and it is up to you to decide what you do with it. I don't claim any responsibility if your Instagram account is affected by how you use this library.
What is it?
instascrape is a lightweight Python package that provides an expressive and flexible API for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.
Key features
Here are a few of the things that…
The companies we'll be comparing for this exercise are
Scraping the data
First, let's start by getting a list
of their usernames:
companies = ["google", "apple", "ibm", "facebook", "microsoft", "adobe", "oracle"]
Now, scraping our data is as easy as
from instascrape import Profile
profiles = [Profile(username) for username in companies]
for prof in profiles:
prof.scrape()
And that's it! We just scraped 364 data points from 7 profiles with just a few lines of code, let's use the to_dict
method to get a list
of dict
's that can be passed into a pandas.DataFrame
for expressive and powerful data analysis.
import pandas as pd
data = [prof.to_dict() for prof in profiles]
df = pd.DataFrame(data)
Exploring our data
First, let's start by comparing how many followers each page has using a matplotlib bar plot:
import matplotlib.pyplot as plt
plt.style.use("seaborn-darkgrid")
plt.bar(df["username"], df["followers"])
We can immediately see that Apple clearly has the most followers and surprisingly, Facebook doesn't have as many as one might expect.
Now let's see who has the most amount of posts:
import matplotlib.pyplot as plt
plt.style.use("seaborn-darkgrid")
plt.bar(df["username"], df["posts"])
Finally, we're going to examine each page's engagement as a function of time and see how the different pages are doing
(NOTE: some of the specifics in the code are skipped so we can focus on what's important; additionally Apple will not be pictured as their data is significantly larger)
for prof in profiles:
posts = prof.get_recent_posts() #gets the 12 most recent posts
posts_data = [post.to_dict() for post in posts]
post_df = pd.DataFrame(posts_data)
plt.plot(post_df.upload_date, post_df.likes, label=prof.username)
Some interesting things we can see right off the bat are:
- Oracle barely gets any likes
- Surprisingly neither does Facebook
- Adobe, Google, and Microsoft post relatively frequently
- IBM hasn't posted in almost two weeks
- Microsoft gets the most likes on average on their posts
Conclusion
And that's pretty much it! This is just a small taste of what instascrape can accomplish and it's up to you with how you use it so get out there and start exploring that data!
If you like what you read, check out some of my other posts 😄
Scraping 25,000 data points from Joe Biden's Instagram using instascrape
Chris Greening ・ Nov 5 '20 ・ 2 min read
Downloading recent Instagram photos using instascrape and Python
Chris Greening ・ Oct 26 '20 ・ 2 min read
Also, check out the official repository and drop it a star ⭐ or contribute!
chris-greening / instascrape
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
instascrape: powerful Instagram data scraping toolkit
DISCLAIMER:
Instagram has gotten increasingly strict with scraping and using this library can result in getting flagged for botting AND POSSIBLE DISABLING OF YOUR INSTAGRAM ACCOUNT. This is a research project and I am not responsible for how you use it. Independently, the library is designed to be responsible and respectful and it is up to you to decide what you do with it. I don't claim any responsibility if your Instagram account is affected by how you use this library.
What is it?
instascrape is a lightweight Python package that provides an expressive and flexible API for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.
Key features
Here are a few of the things that…
Top comments (3)
Hello. Great post. I've been reading all your posts related to instascrape. But something is troubling me. For some reason de created DataFrame end ups being completely empty. And that is the case with everything, followers = nan, username = nan, following = nan.
I am currently running the code in google collab. I made sure to (in collab):
Any suggestions?
back you.
when i scraped google instagram profile i got 'username'=nan. please clarify, thank you