Recently, there have been talks of Instagram closing down its API and leaving access to only corporate partners.
Data scraping becomes even more important in this scenario because of Instagram's large user base. Instagram is a platform full of data in its every nook and cranny.
I decided to start by scraping whatever data we can find on a person's account page, which you can access at https://instagram.com/
Let's take a look at my page for example at https://instagram.com/manan.code
This is the main area I am interested in, what all could we scrape from here and how? Right-click on the page and click view page source to see the source file behind it.
You'll see something like this -
Now at first look, this seems incomprehensible and it seems almost impossible to find any data from this, it's just sea of link and script tags.
But the data is there somewhere for sure.
I did some digging and found out the script tag that consists basically everything we need.
Now that we know where the data is, let's move on to the code.
We'll use the requests module and BeautifulSoup.
So till this point in the code, we've requested Instagram and got the source, after that we've converted it to a BeautifulSoup object to make it easy to find the script tag we need. After converting it to BeautifulSoup object, we've used the find_all function in the BeautifulSoup library and found all the script tags, by a little trial and error, I discovered, the script tag we need is the 5th one, so we index it appropriately and find the script tag we need.
But, we need to do one more thing, right now what we have is not a string, we can't slice it to find what we need. Hence, we access the contents of the script tag.
The next step is to find out where's the part we need.
If you print
data_json, this is what you get -
On looking closely, I figured out all the right keys to the data we need, here is the result.
and this marks the end of our journey to scraping Instagram!
Check out my video where I go over the same thing -