Discussion on: BeautifulSoup or Scrapy?

View post

Jean-Michel Plourde • Jun 15 '19 • Edited

It really depends on the needs. While they both get HTML, they aren't doing it to the same length and with the same capabilities.

Beautifulsoup is library that parse the HTML from a given URL without any efforts. It fetches the HTML then it stops (you could add some automation but there is already other tools doing it). It gives you access to the data without any hassle.

Scrapy is a full fledged framework to get all the HTML from many pages inside a set of domains. You specify constraints and it fetches all the HTML it can within the limits you set.

It boils down to a library vs a framework.

I'm currently working on a project where I need to fetch some data from a website with requests then parsing the HTML with BeautifulSoup. It's simple and surface parsing.

There is another project where a bot is crawling many websites, collect all the data then sends it to a neural network to work on it. In this case scrapy is the best option because you just put some rules and send it doing its job automatically.