DEV Community

Discussion on: Web Scraping with Javascript and Node.js

Collapse
 
mgonzalo profile image
Gonzalo Muñoz • Edited

Great article! I always thought web crawling was too difficult without python.
A couple of comments, though. Shouldn't be cleaner to use Promise.all() instead of manually implementing a queue?
Also, I believe one could implement a poor man's rotating proxy list by creating an array of free proxies and picking one at random for every call.

Collapse
 
anderrv profile image
Ander Rodriguez

Hi, thanks for the ideas!
Promise.all is definitely an option if you know all the URLs beforehand. I think there is no way to add new ones after launching the process. And setting a concurrency limit would probably be difficult too.

As for the random proxy, yes, you could implement something similar to the sample function for headers. A list of all the proxies available and picking one at random for each request.

Collapse
 
mgonzalo profile image
Gonzalo Muñoz

Right! Didn't think of that. We need something like a sliding window, if you will. Maybe the queue is the simplest option then.