DEV Community

Cover image for How to scrape YouTube search result with Node.js?
Mikhail Zub
Mikhail Zub

Posted on

How to scrape YouTube search result with Node.js?

Intro

This time I would like to tell you how to scrape YouTube search result with Node.js. I will show you how to do this with Puppeteer and an easier and more reliable way.

Preparation

First, we need to create a Node.js project and add npm packages "Puppeeteer" and "Puppeteer stealth plugin". To do this, in the directory with our project, open the command line and enter:
npm init -y
then:
npm i puppeteer puppeteer-extra puppeteer-extra-plugin-stealth

What will be scraped

YouTube video search result
YouTube Search

Process

To select the required CSS selectors, I recommend something like SelectorGadget Chrome extension
Below I will show you how to do it.
Grab CSS selectors

Code

const puppeteer = require("puppeteer-extra");
const StealthPlugin = require("puppeteer-extra-plugin-stealth");

puppeteer.use(StealthPlugin());

const searchString = "psy";
const encodedString = encodeURI(searchString);

exports.getSearchResults = async function getSearchResults() {
  const browser = await puppeteer.launch({
    headless: false,
    args: ["--no-sandbox", "--disable-setuid-sandbox"],
  });

  const page = await browser.newPage();

  await page.setDefaultNavigationTimeout(60000);
  await page.goto(`https://www.youtube.com/results?search_query=${encodedString}&gl=US&hl=EN`);
  await page.waitForSelector("#contents > ytd-video-renderer");
  await page.waitForTimeout(5000);

  const organicResult = await page.evaluate(function () {
    return Array.from(document.querySelectorAll("#contents > ytd-video-renderer")).map((el) => ({
      link: "https://www.youtube.com" + el.querySelector("a#thumbnail").getAttribute("href"),
      title: el.querySelector("a#video-title").textContent.trim(),
      description: el.querySelector(".metadata-snippet-container > yt-formatted-string").textContent.trim(),
      views: el.querySelectorAll("#metadata-line > span")[0].textContent.trim(),
      published_date: el.querySelectorAll("#metadata-line > span")[1].textContent.trim(),
      channel: {
        name: el.querySelector("#channel-info #channel-name a").textContent.trim(),
        link: "https://www.youtube.com" + el.querySelector("#channel-info #channel-name a").getAttribute("href"),
      },
    }));
  });

  await browser.close();

  console.log("Puppeteer results:");
  console.log(organicResult);
  return organicResult;
};
Enter fullscreen mode Exit fullscreen mode

Output

[
  {
    link: 'https://www.youtube.com/watch?v=9bZkp7q19f0',
    title: 'PSY - GANGNAM STYLE(강남스타일) M/V',
    description: '#PSY #싸이 #GANGNAMSTYLE #강남스타일 More about PSY@ http://www.youtube.com/officialpsy ...',
    views: '4.1B views',
    published_date: '9 years ago',
    channel: {
      name: 'officialpsy',
      link: 'https://www.youtube.com/channel/UCrDkAvwZum-UTjHmzDI2iIw'
    }
  },
...
]
Enter fullscreen mode Exit fullscreen mode

Using YouTube Search Engine Results API

SerpApi is a free API with 100 search per month. If you need more searches, there are paid plans.

The difference is that all that needs to be done is just to iterate over a ready made, structured JSON instead of coding everything from scratch, and selecting correct selectors which could be time consuming at times.

First we need to install "google-search-results-nodejs". To do this you need to enter:
npm i google-search-results-nodejs

Code

const SerpApi = require("google-search-results-nodejs");
const search = new SerpApi.GoogleSearch(process.env.API_KEY); // Your API key

const params = {
  engine: "youtube",
  search_query: "psy",
  gl: "US",
  hl: "EN",
};

const getSearchInfo = function (data) {
  console.log("SerpApi results:");
  console.log(data.video_results);
};

exports.searchOrganic = () => search.json(params, getSearchInfo);
Enter fullscreen mode Exit fullscreen mode

Output

[
  {
    position_on_page: 1,
    title: 'PSY - GANGNAM STYLE(강남스타일) M/V',
    link: 'https://www.youtube.com/watch?v=9bZkp7q19f0',
    channel: {
      name: 'officialpsy',
      link: 'https://www.youtube.com/channel/UCrDkAvwZum-UTjHmzDI2iIw',
      verified: true,
      thumbnail: 'https://yt3.ggpht.com/ytc/AKedOLTaLi3ZZezCzDSA0s1cuKPQYQvoL4QiTkOhQuS1-Q=s88-c-k-c0x00ffffff-no-rj'
    },
    published_date: '9 years ago',
    views: 4199162273,
    length: '4:13',
    description: '#PSY #싸이 #GANGNAMSTYLE #강남스타일 More about PSY@ http://www.youtube.com/officialpsy ...',
    thumbnail: {
      static: 'https://i.ytimg.com/vi/9bZkp7q19f0/hq720.jpg?sqp=-oaymwEcCOgCEMoBSFXyq4qpAw4IARUAAIhCGAFwAcABBg==&rs=AOn4CLC2ZQhOjkk_NTkfURgVv3PC9LljiA',
      rich: 'https://i.ytimg.com/an_webp/9bZkp7q19f0/mqdefault_6s.webp?du=3000&sqp=CP6b0ooG&rs=AOn4CLCNXW6aH0Qd5C9F1s8RVTXLahoGqQ'    
    }
  },
...
]
Enter fullscreen mode Exit fullscreen mode

Let's summarize

As you can see, using our API is much easier and faster. The resulting output contains all the information you need from the page and, most importantly, you don't need to worry about maintaining your code.

Links

Code in the online IDESerpApi PlaygroundSerpApi Documentation

Outro

If you want to see how to scrape something using Node.js that I didn't write about yet or you want to see some project made with SerpApi, please write me a message.

Discussion (0)