DEV Community

Jason F
Jason F

Posted on

Web Scraping With Puppeteer for Total Noobs: Part 2

Hello and welcome to the second post in this series on web scraping with Puppeteer. If you missed the first post you can check it out here. In this post we'll pick up where we left off and scrape some weather data from weather.com. The current goal is to scrape the 10-day forecast of Austin, Texas. Feel free to swap out Austin for your favorite city.

Picking up where we left off

Our scrape function that we created in the previous post looks like this:

async function scrape() {
  const browser = await puppeteer.launch({ dumpio: true });
  const page = await browser.newPage();

  await page.goto("https://weather.com/weather/tenday/l/Austin+TX");

  const weatherData = await page.evaluate(() =>
    Array.from(
      document.querySelectorAll(".DaypartDetails--DayPartDetail--2XOOV"),
      (e) => ({
        date: e.querySelector("h3").innerText,
      })
    )
  );

  await browser.close();
  return weatherData;
}

const scrapedData = await scrape();
console.log(scrapedData);
Enter fullscreen mode Exit fullscreen mode

Let's now add to the weatherData. In addition to the innerText of the h3 , we'll get the high temperature, the low temperature, and the precipitation percentage for the day.

Let's have a look at how we can do that:

const weatherData = await page.evaluate(() =>
    Array.from(
      document.querySelectorAll(".DaypartDetails--DayPartDetail--2XOOV"),
      (e) => ({
        date: e.querySelector("h3").innerText,
        highTemp: e.querySelector(".DetailsSummary--highTempValue--3PjlX")
          .innerText,
        lowTemp: e.querySelector(".DetailsSummary--lowTempValue--2tesQ")
          .innerText,
        precipitationPercentage: e.querySelector(
          ".DetailsSummary--precip--1a98O"
        ).innerText,
      })
    )
Enter fullscreen mode Exit fullscreen mode

As you can see I am adding three new properties to the object that's returned in the Array.from mapping function. These properties are highTemp, lowTemp, and precipitationPercentage. I found the class names by inspecting the document in the browser. These values seem to work, but only time will tell if something will have to be updated.

Let's now run node scraper.js in the terminal and check out the results:

[
  {
    date: 'Tonight',
    highTemp: '--',
    lowTemp: '31°',
    precipitationPercentage: '84%'
  },
  {
    date: 'Thu 02',
    highTemp: '41°',
    lowTemp: '32°',
    precipitationPercentage: '53%'
  },
  {
    date: 'Fri 03',
    highTemp: '55°',
    lowTemp: '30°',
    precipitationPercentage: '6%'
  },
  {
    date: 'Sat 04',
    highTemp: '57°',
    lowTemp: '40°',
    precipitationPercentage: '7%'
  },
  {
    date: 'Sun 05',
    highTemp: '64°',
    lowTemp: '47°',
    precipitationPercentage: '9%'
  },
  {
    date: 'Mon 06',
    highTemp: '71°',
    lowTemp: '58°',
    precipitationPercentage: '14%'
  },
  {
    date: 'Tue 07',
    highTemp: '68°',
    lowTemp: '50°',
    precipitationPercentage: '54%'
  },
  {
    date: 'Wed 08',
    highTemp: '60°',
    lowTemp: '47°',
    precipitationPercentage: '40%'
  },
  {
    date: 'Thu 09',
    highTemp: '60°',
    lowTemp: '42°',
    precipitationPercentage: '52%'
  },
  {
    date: 'Fri 10',
    highTemp: '62°',
    lowTemp: '38°',
    precipitationPercentage: '17%'
  },
  {
    date: 'Sat 11',
    highTemp: '59°',
    lowTemp: '42°',
    precipitationPercentage: '11%'
  },
  {
    date: 'Sun 12',
    highTemp: '64°',
    lowTemp: '48°',
    precipitationPercentage: '15%'
  },
  {
    date: 'Mon 13',
    highTemp: '67°',
    lowTemp: '51°',
    precipitationPercentage: '24%'
  },
  {
    date: 'Tue 14',
    highTemp: '71°',
    lowTemp: '51°',
    precipitationPercentage: '24%'
  },
  {
    date: 'Wed 15',
    highTemp: '70°',
    lowTemp: '50°',
    precipitationPercentage: '21%'
  }
]
Enter fullscreen mode Exit fullscreen mode

Very cool. We're getting the values I'd expect to get.

GitHub Repo

I've set up a GitHub repository for this project. You can find the link here. Feel free to fork/clone this repository and play around. If you're not too comfortable with using git, there's a plethora of resources out there. If you'd be interested in a tutorial for noobs, please let me know in the comment section.

Wrapping up

In this post we were able to scrape a bit more weather forecast data and return it in our scrape function. In the next post I'll show you how to create a GitHub Action that will run the scrape function once a day and save the scraped weather data in a .json file in the same GitHub repository.

Top comments (0)