DEV Community

Discussion on: BeautifulSoup or Scrapy?

Collapse
 
eluzix profile image
eluzix • Edited

They are not the same thing.

Beautiful Soup allows you to build a navigatable tree from HTML and XML sources (be a file, URL or a stream). After building the tree, you can search modify it or pull data out.

Scrapy is a framework for crawling and scraping content from websites. For each page crawled you get access to it's DOM so you can extract your relevant information. This part is much like BS so if you are looking for comparison that's where you should look.

To give a living an example, I built a system that crawls a website for its historical content, extract and save the data. Then, periodically check the site content via it's RSS stream.

For the initial crawling, I used Scrapy to easily navigate through the site content, for the RSS stage, I used BS4 to parse each new URL I got from the RSS.

Edit:
Working with Scrapy you can use BS to extract information from the HTML you got, see docs.scrapy.org/en/latest/topics/s...