DEV Community

Jouini Hamza
Jouini Hamza

Posted on

neural network for scraping

Sometimes in ecommerce web scraping you scrape the same product but with different names like :
Msi RTX 2070 Ti
Nvidia rtx 2070ti MSI 8gb
And more complicated examples, I wonder if there is any neural net to classify these type of products

Top comments (4)

Collapse
 
amartyadev profile image
Amartya Gaur

Rather than a neural network, I think you should look at more differentiating features from say the description and remove the duplicate products. For instance, the model number will be the same for the above laptops, if you are using a custom scraper, it should be doable, find the model number from the details page and make it the primary key in your db, that should do.

Collapse
 
sihamza profile image
Jouini Hamza

it's doable for 2 3 websites but i am scraping like 8 websites

Collapse
 
amartyadev profile image
Amartya Gaur

Yes I think it's still doable, but I will still try to find any network if available. Also, you will be adding great overhead if you're using a neural net not to mention the r sources you'd need in case you need to run it on server

Collapse
 
yuhuishishishi profile image
I love integers

A good tokenizer and pairwise Levenshtein distance should be a good starting point. en.wikipedia.org/wiki/Levenshtein_...