DEV Community

Sohail Pathan
Sohail Pathan

Posted on

Tools of Extracting Text from Webpage

Hey Dev Community,

Recently, I wrote a blog for sharing some of my found tools for scraping text from webpage.

This included free open source libraries and cloud API.

Have a look to know which tools suits your need and maybe give a try ?

Link : https://apyhub.com/blog/extracting-text-from-webpages

Let me know if I've missed any popular resource. I'll add that in my next blog.

Top comments (2)

Collapse
 
ranjancse profile image
Ranjan Dailata • Edited

Great to see your post :)

Here's a failure scenario. Please try to extract the Trip Advisor URL. Ex: TripAdvisor_Tourism

Apyhub_APIPlayground

Suggestion -

  • It would be great to let the users specify the structured schema for the data extraction.
  • Text extraction is truly a basic thing, However the value that you could provide is with the structured data extraction.
  • Ability to auto web crawl and handle all the related link extraction. Basically a web spider kind of thing.
  • Good to mention the known issues or challenges such as rate limits, how are you going to handle the 3rd party content crawl issues or rate limits?
  • Please mention various use-cases or scenarios where this API will be helpful. I see you have mentioned few statements in the conclusion, however I doubt it will be super useful to the end users unless the structured data is provided
  • Think in terms of a business user. How would these APIs will benefit from various angles needs to be thought well.
Collapse
 
iamspathan profile image
Sohail Pathan

Thanks, Ranjan for the feedback. I've shared this internally with the team and will let you know once we address it.