DEV Community

Michael Burrows
Michael Burrows

Posted on • Updated on • Originally published at w3collective.com

Scrape sever-side rendered HTML content with JavaScript

Note: An updated version of this working version tutorial can be found here.


“Scraping” can be used to collect and analyse data from sources that don’t have API’s.

In this tutorial we’ll scrape content using JavaScript from a website that’s rendered server-side.

You’ll need to have Node.js and npm installed if you haven’t already.

Let’s start by creating a project folder and initialising it with a package.json file:

mkdir scraper
npm init -y
Enter fullscreen mode Exit fullscreen mode

We’ll be using two packages to build our scraper script.

  • axios – Promise based HTTP client for the browser and node.js.
  • cheerio – Implementation of jQuery designed for the server (makes it easy to work with the DOM).

Install the packages by running the following command:

npm install axios cheerio --save
Enter fullscreen mode Exit fullscreen mode

Next create a file called scrape.js and include the packages we just installed:

const axios = require("axios");
const cheerio = require("cheerio");
Enter fullscreen mode Exit fullscreen mode

In this example i’ll be using https://lobste.rs/ as the data source to be scraped.

Inspecting the code the site name in the header has a cur_url class so let’s see if we can scrape it’s text:

Alt Text

Add the following to scrape.js to fetch the HTML and log the title text if successful:

axios('https://lobste.rs/')
  .then((response) => {
    const html = response.data;
    const $ = cheerio.load(html);    
    const title = $(".cur_url").text();   
    console.log(title);
  })
  .catch(console.error);
Enter fullscreen mode Exit fullscreen mode

Run the script with the following command and you should see Lobsters logged in the terminal:

node scrape.js
Enter fullscreen mode Exit fullscreen mode

If everything’s working we can proceed to scrape some actual content from the website.

Let’s get the titles, domains and points for each of the stories on the homepage by updating scrape.js:

axios("https://lobste.rs/")
  .then((response) => {
    const html = response.data;
    const $ = cheerio.load(html);
    const storyItem = $(".story");
    const stories = [];
    storyItem.each(function () {
      const title = $(this).find(".u-url").text();
      const domain = $(this).find(".domain").text();
      const points = $(this).find(".score").text();
      stories.push({
        title,
        domain,
        points,
      });
    });
    console.log(stories);
  })
  .catch(console.error);
Enter fullscreen mode Exit fullscreen mode

This code loops through each of the stories, grabs the data, and then stores it in an array called stories.

If you’ve worked with jQuery then the selectors will be familiar, if not you can learn about them here.

Now re-run node scrape.js and you should see the data for each of the stories:

Alt Text

Top comments (0)