Discussion on: The Easy Way to Scrape Instagram Using Python Scrapy & GraphQL

View post

Replies for: Sure, here it is gist.github.com/Drakula2k/035cc5bd... I also fixed a couple of bugs there

Thanks so much for sharing! After making the changes I am unfortunately still getting blocked by the robots.txt file. Is this code still working for you?

Vlad • Feb 2 '21

Yes, it's working. You can disable the robots.txt check by setting ROBOTSTXT_OBEY = False on your settings.py. It works via an API so there is no need for the robots.txt check.

karisjochen • Feb 4 '21

incredible, thank you! It worked! So is it always a good idea to set the ROBOTSTXT_OBEY = False considering we dont want to be stopped?

Vlad • Feb 4 '21

Yes, ROBOTSTXT_OBEY is good when you're building something like a search engine and it may request all sorts of random URLs posted on the Internet. In that case, using robots.txt is good to skip non-public pages.

But if you're requesting particularly defined URLs or using an API, robots.txt is not so useful and may block access to the API.

kaiwangyu • Dec 24 '21

thanks a lot, I learned a ton from your code... but im still get confused by the query_hash. may I ask how do you get this constant for this tpye of query,pls?

Vlad • Dec 24 '21

Open Inspector in Chrome, visit Instagram and scroll through the posts, you'll see the same GraphQL queries with query_hash.
I'm not sure what query_hash value means exactly, but they're static for each type of query it seems.

kaiwangyu • Dec 25 '21 • Edited

Ohhh, I see, it's a constant number(every time drop-down the perfil), but for me it's a diferent number, not 'e769aa130647d2354c40ea6a439bfc08', by the way, thank you so much, I am beginner on Scrapy, and do you sugguest any book or tutorial to learn advanced project based on Scrapy, I already bought this book .

Kai
Merry Chrismas
Regards

Vlad • Dec 25 '21

They may have changed something, but the old value still works too, it seems.
I'm not a specialist in Scrapy, but generally, I'd read official docs (docs.scrapy.org/en/latest/) and then start doing some projects using it and learn from them.