re: BeautifulSoup or Scrapy? VIEW POST


They are not the same thing.

Beautiful Soup allows you to build a navigatable tree from HTML and XML sources (be a file, URL or a stream). After building the tree, you can search modify it or pull data out.

Scrapy is a framework for crawling and scraping content from websites. For each page crawled you get access to it's DOM so you can extract your relevant information. This part is much like BS so if you are looking for comparison that's where you should look.

To give a living an example, I built a system that crawls a website for its historical content, extract and save the data. Then, periodically check the site content via it's RSS stream.

For the initial crawling, I used Scrapy to easily navigate through the site content, for the RSS stage, I used BS4 to parse each new URL I got from the RSS.

Working with Scrapy you can use BS to extract information from the HTML you got, see docs.scrapy.org/en/latest/topics/s...


So Scrapy can't do DOM manipulation? Interesting!


Yes, it can. But you can also use whatever you want as a parser.


You can use any output you want so rewriting the DOM is possible.

code of conduct - report abuse