DEV Community

Cover image for A JavaScript scraper for the Wikipedia Academy Award List.
Everton Tenorio
Everton Tenorio

Posted on

2

A JavaScript scraper for the Wikipedia Academy Award List.

Scraping the Academy Award winners listed on Wikipedia with cheerio and saving them to a CSV file.

Today, a simple demonstration of how to scrape data using JavaScript with the cheerio library. For this, we'll use the list of Academy Award winners directly from Wikipedia.

First, install the necessary packages:

npm install cheerio axios
Enter fullscreen mode Exit fullscreen mode

The URL used is:

const url = 'https://en.wikipedia.org/wiki/List_of_Academy_Award%E2%80%93winning_films';
Enter fullscreen mode Exit fullscreen mode

Next, we'll load the HTML using the load function and prepare two variables to hold the columns and the necessary information from the table:

const { data: html } = await axios.get(url);
const $ = cheerio.load(html); 

const theadData = [];
const tableData = [];
Enter fullscreen mode Exit fullscreen mode

table

Now we'll select and manipulate the elements as we traverse the DOM, which are Cheerio objects returned in the $ function:

$('tbody').each((i, column) => { 
    const columnData = [];
    $(column)
      .find('th')
      .each((j, cell) => {
      columnData.push($(cell).text().replace('\n',''));
    });
    theadData.push(columnData)
  }) 

  tableData.push(theadData[0]) 

$('table tr').each((i, row) => {
    const rowData = []; 
    $(row)
      .find('td')
      .each((j, cell) => {
        rowData.push($(cell).text().trim());
      });

    if (rowData.length) tableData.push(rowData)
  })
Enter fullscreen mode Exit fullscreen mode

Glad you still know jQuery...

Finally, save the data as it is, even without processing the data 😅 into a .csv spreadsheet with fs.writeFileSync.

Note, I used ";" as the delimiter.

const csvContent = tableData
    .map((row) => row.join(';')) 
    .join('\n');

fs.writeFileSync('academy_awards.csv', csvContent, 'utf-8');
Enter fullscreen mode Exit fullscreen mode

running

node scraper.js
Enter fullscreen mode Exit fullscreen mode

cheerio csv

I’ve written other tutorials here on dev.to about scraping, with Go and Python, and If this article helped you or you enjoyed it, consider contributing: donate

Top comments (0)

AWS Security LIVE!

Join us for AWS Security LIVE!

Discover the future of cloud security. Tune in live for trends, tips, and solutions from AWS and AWS Partners.

Learn More

👋 Kindness is contagious

Engage with a wealth of insights in this thoughtful article, valued within the supportive DEV Community. Coders of every background are welcome to join in and add to our collective wisdom.

A sincere "thank you" often brightens someone’s day. Share your gratitude in the comments below!

On DEV, the act of sharing knowledge eases our journey and fortifies our community ties. Found value in this? A quick thank you to the author can make a significant impact.

Okay