There are lot of ways in which a developer can scrap web using Python, but why we tend to rely on BeautifulSoup as our first choice or the only choice ??
Is it because when we google web scraping using python
, we get a whole lot of links for BeautifulSoup tutorial ? Or because we actually know the benefits of using BeautifulSoup ?
Same functionality can be achieved simply by using urllib library, but it has its own limitations, one such limitation is writing several methods from scratch that are readily available in BeautifulSoup.
On the other hand, writing methods from scratch can help us define custom behaviour !
Sometimes HTML is so disorganised that BeautifulSoup may not interpret the HTML tags properly.
There are forms we may need to scrap, then we would need something extra - MechanicalSoup !
Yes, another ‘SOUP’ library (don’t know why scraping community loves soup so much or is it Software Of Unknown Pedigree ?)
There are so many modules to do a particular task, why aren’t we making a pros/cons list of those but simply following what a tutorial mentions ?
If we know how to debug a code, then we should just dive into code of such open-source libraries and see for ourself whether they solve our problem the way we want.
What are your views on different scraping libraries available ? Which one do you prefer or use regularly ?
Top comments (0)