Web scraping means extracting data from websites. This post covers extracting data from the page's HTML tags.
Prerequisites
cheerio
package is installedHTML page is retrieved via an HTTP client
Usage
- create a scraper object with
load
method by passing HTML content as an argument- set
decodeEntities
option to false to preserve encoded characters (like &) in their original form
- set
const $ = load('<div><!-- HTML content --></div>', { decodeEntities: false });
- find DOM elements by using CSS-like selectors
const items = $('.item');
- iterate through found elements using
each
method
items.each((index, element) => {
// ...
});
access element content using specific methods
- text
$(element).text()
- HTML
$(element).html()
- attributes
- all
$(element).attr()
- specific one
$(element).attr('href')
- all
- child elements
- first
$(element).first()
- last
$(element).last()
- all
$(element).children()
- specific one
$(element).find('a')
- first
- siblings
- previous
$(element).prev()
- next
$(element).next()
- previous
Disclaimer
Please check the website's terms of service before scraping it. Some websites may have terms of service that prohibit such activity.
Course
Build your SaaS in 2 weeks - Start Now
Top comments (0)