DEV Community

Sebastian Scheibe
Sebastian Scheibe

Posted on

Part 2: Parse website HTML with NodeJS and cheerio

Hello back again! If you missed out on how to create the selector for HTML data, please check out part one of the series! Otherwise, let us proceed with the automatic parsing of the data!

We begin with creating a new project. Open up your terminal and enter the following to create a NodeJS project.

mkdir web-data-retrieve && cd web-data-retrieve && npm init

After that, let us add got (HTTP request client) and cheerio (jQuery equivalent for NodeJS). The former will be used to request the content and the later to parse it.

npm install --save cheerio got

Now, let us create an index file which will contain our code.

touch index.js

Open up the index.js file and add the following code inside:

const cheerio = require('cheerio');
const got = require('got');

const url = 'https://www.marketwatch.com/investing/stock/aapl/financials';

(async () => {
    try {
        const response = await got(url);
        const $ = cheerio.load(response.body)
        const selected = $('.financials tr.partialSum:nth-child(1) td.valueCell')
        const cellData = selected.toArray().map(cell => cell.firstChild.data)
        console.log(cellData)
    } catch (error) {
        console.log(error.response.body);
    }
})();

Now let me explain what is going on here, top to bottom. The first two lines are the libraries we will be using:

const cheerio = require('cheerio');
const got = require('got');

The next line is the URL which will be getting the content from, it is identical with the one from the browser.

const url = 'https://www.marketwatch.com/investing/stock/aapl/financials';

The next line is a bit more complex, it is a wrapper, which let us use asynchronous functions. To have that explained, it is better to follow another tutorial :). The more interesting part happens inside of it!

The first line calls the library got with the URL which was defined in the beginning of our file. It's response is saved in the response variable.

try {
        const response = await got(url); # <- This line
        const $ = cheerio.load(response.body)
        const selected = $('.financials tr.partialSum:nth-child(1) td.valueCell')
        const cellData = selected.toArray().map(cell => cell.firstChild.data)
        console.log(cellData)
} catch (error) {
        console.log(error.response.body);
}

After that, the response will be loaded with cheerio. It gives us a query operator, similar to what was used with jQuery in the browser.

try {
        const response = await got(url);
        const $ = cheerio.load(response.body) # <- This line
        const selected = $('.financials tr.partialSum:nth-child(1) td.valueCell')
        const cellData = selected.toArray().map(cell => cell.firstChild.data)
        console.log(cellData)
} catch (error) {
        console.log(error.response.body);
}

What follows is the use of our selector which we created before in the browser. It will return us the selected elements.

try {
        const response = await got(url);
        const $ = cheerio.load(response.body)
        const selected = $('.financials tr.partialSum:nth-child(1) td.valueCell') # <- This line
        const cellData = selected.toArray().map(cell => cell.firstChild.data)
        console.log(cellData)
} catch (error) {
        console.log(error.response.body);
}

The last magical line in the try converts the selected data to standard JS array which allows us to map over it. In the map we take the first child and look at its data.

try {
        const response = await got(url);
        const $ = cheerio.load(response.body)
        const selected = $('.financials tr.partialSum:nth-child(1) td.valueCell')
        const cellData = selected.toArray().map(cell => cell.firstChild.data) # <- This line
        console.log(cellData)
} catch (error) {
        console.log(error.response.body);
}

The application will print out the sales for the Apple stocks:

# node index.js
[ '231.28B', '214.23B', '228.57B', '265.81B', '259.97B' ]

Where to go from here? Well, you can modify the application in ways to retrieve more data of the Apple stock or even fetch different stocks. The possibilities are endless, whatever suits your purpose.

References:

https://github.com/sindresorhus/got

https://cheerio.js.org

https://www.w3schools.com/jquery/jquery_ref_selectors.asp

https://api.jquery.com/category/selectors/

Top comments (0)