loading...

neural network for scraping

sihamza profile image Jouini Hamza ・1 min read

Sometimes in ecommerce web scraping you scrape the same product but with different names like :
Msi RTX 2070 Ti
Nvidia rtx 2070ti MSI 8gb
And more complicated examples, I wonder if there is any neural net to classify these type of products

Discussion

pic
Editor guide
Collapse
amartyadev profile image
Amartya Gaur

Rather than a neural network, I think you should look at more differentiating features from say the description and remove the duplicate products. For instance, the model number will be the same for the above laptops, if you are using a custom scraper, it should be doable, find the model number from the details page and make it the primary key in your db, that should do.

Collapse
sihamza profile image
Jouini Hamza Author

it's doable for 2 3 websites but i am scraping like 8 websites

Collapse
amartyadev profile image
Amartya Gaur

Yes I think it's still doable, but I will still try to find any network if available. Also, you will be adding great overhead if you're using a neural net not to mention the r sources you'd need in case you need to run it on server

Collapse
yuhuishishishi profile image
I love integers

A good tokenizer and pairwise Levenshtein distance should be a good starting point. en.wikipedia.org/wiki/Levenshtein_...