DEV Community

Željko Šević
Željko Šević

Posted on • Originally published at sevic.dev

Web scraping with cheerio

Web scraping means extracting data from websites. This post covers extracting data from the page's HTML tags.

Prerequisites

  • cheerio package is installed

  • HTML page is retrieved via an HTTP client

Usage

  • create a scraper object with load method by passing HTML content as an argument
    • set decodeEntities option to false to preserve encoded characters (like &) in their original form
  const $ = load('<div><!-- HTML content --></div>', { decodeEntities: false });
Enter fullscreen mode Exit fullscreen mode
  • find DOM elements by using CSS-like selectors
  const items = $('.item');
Enter fullscreen mode Exit fullscreen mode
  • iterate through found elements using each method
  items.each((index, element) => {
    // ...
  });
Enter fullscreen mode Exit fullscreen mode

access element content using specific methods

  • text $(element).text()
  • HTML $(element).html()
  • attributes
    • all $(element).attr()
    • specific one $(element).attr('href')
  • child elements
    • first $(element).first()
    • last $(element).last()
    • all $(element).children()
    • specific one $(element).find('a')
  • siblings
    • previous $(element).prev()
    • next $(element).next()

Disclaimer

Please check the website's terms of service before scraping it. Some websites may have terms of service that prohibit such activity.

Related reading

Demo

The demo with the mentioned examples is available here.

Top comments (0)