Discussion on: Web Scraping with Javascript and Node.js

View post

Great article! I always thought web crawling was too difficult without python.
A couple of comments, though. Shouldn't be cleaner to use Promise.all() instead of manually implementing a queue?
Also, I believe one could implement a poor man's rotating proxy list by creating an array of free proxies and picking one at random for every call.

Ander Rodriguez • Sep 3 '21

Hi, thanks for the ideas!
Promise.all is definitely an option if you know all the URLs beforehand. I think there is no way to add new ones after launching the process. And setting a concurrency limit would probably be difficult too.

As for the random proxy, yes, you could implement something similar to the sample function for headers. A list of all the proxies available and picking one at random for each request.

Gonzalo Muñoz • Sep 3 '21

Right! Didn't think of that. We need something like a sliding window, if you will. Maybe the queue is the simplest option then.