DEV Community

Discussion on: Building URL's to crawl based on other websites

Collapse
 
peterj profile image
Peter Jausovec

I'd create a map of preferences to the names of query parameters for each website you want to scrape (assuming there is no way to get the query parameters dynamically from each site).

For example - for each website you'd store the name of your preference (e.g. maxPrice) and how the query parameters is named on the website you are trying to scape:
{
"some-website1.com": {
"maxPrice": "max_price",
"region": "region", ....
}
}

This should work fine assuming all websites you are scraping are using query parameters. Using the information from the map you will be able to build up the URL, make a request and get the data back. If you need to parse the data (assuming you're not getting JSON/XML back), I'd look into something like BeautifulSoup (a Python library) that allows you to easily parse HTML for example.

Collapse
 
bartude profile image
Bartude

Yeah, that seems to be the way to go with no other visible solution. I'm gonna have to build in safeguards in case those websites by some chance decide to change the query parameters names. Thanks!