DEV Community

Discussion on: Collecting one million website links

Collapse
 
kdinnypaul profile image
Dinny Paul

You could use fake_useragent python library to change user agent with every request so that you don't get blocked by that website and you could also use free proxies thereby changing you ip address with every request :)

Collapse
 
anuragrana profile image
Anurag Rana

Great suggestions Dinny. However I feel we should be gentle on sites and should not send too many requests per second. That is why I didn't feel the need of using these two libraries.

I have written another article where I have used docker cluster to scrape data at a very high speed. Although I was not able to achieve desired results.

pythoncircle.com/post/518/scraping...