DEV Community

Željko Šević
Željko Šević

Posted on • Edited on • Originally published at sevic.dev

Web scraping with cheerio

Web scraping means extracting data from websites. This post covers extracting data from the page's HTML tags.

Prerequisites

  • cheerio package is installed

  • HTML page is retrieved via an HTTP client

Usage

  • create a scraper object with load method by passing HTML content as an argument
    • set decodeEntities option to false to preserve encoded characters (like &) in their original form
  const $ = load('<div><!-- HTML content --></div>', { decodeEntities: false });
Enter fullscreen mode Exit fullscreen mode
  • find DOM elements by using CSS-like selectors
  const items = $('.item');
Enter fullscreen mode Exit fullscreen mode
  • iterate through found elements using each method
  items.each((index, element) => {
    // ...
  });
Enter fullscreen mode Exit fullscreen mode

access element content using specific methods

  • text $(element).text()
  • HTML $(element).html()
  • attributes
    • all $(element).attr()
    • specific one $(element).attr('href')
  • child elements
    • first $(element).first()
    • last $(element).last()
    • all $(element).children()
    • specific one $(element).find('a')
  • siblings
    • previous $(element).prev()
    • next $(element).next()

Disclaimer

Please check the website's terms of service before scraping it. Some websites may have terms of service that prohibit such activity.

Course

Build your SaaS in 2 weeks - Start Now

Top comments (0)