DEV Community

Cover image for Puppeteer: Basic
Stanislav Smirnov
Stanislav Smirnov

Posted on

Puppeteer: Basic

Puppeteer is a Node library that provides a high-level API to control Chromium, Chrome, or Firefox.

Cases

  1. Automatic account registration
  2. Scrap info from sites different difficulty
  3. Generate screenshots and PDF of pages
  4. Automatic tests of sites

The puppeteer is very powerful. He can do everything the same as a people, but we will only consider web-scrapping

Installation

By default, puppeteer comes with Chromium, but you can use another browser.

Create a folder for your project

mkdir puppeteer
Enter fullscreen mode Exit fullscreen mode

init node project

yarn init
Enter fullscreen mode Exit fullscreen mode

and install puppeteer with

yarn add puppeteer
Enter fullscreen mode Exit fullscreen mode

Puppeteer is now installed, and we ready for coding.

Example

Create the main source file example.js with this content:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
    headless: false,
    //by default puppeteer run in headless
    //this option disable headless and you
    //can view browser instead of headless
    defaultViewport: null
    //by default puppeteer run with non-default viewport
    //this option enable your default viewport
  });
  //create puppeteer browser instance
  //you can run more browsers with
  //const browser2 = await puppeteer.launch();
  const page = await browser.newPage();
  //create page(tab)
  //more pages with
  //const page2 = await browser.newPage();
  await page.goto('https://dev.to');
  //just visit dev.to automatic
})();
Enter fullscreen mode Exit fullscreen mode

And run with node example. You can see Chromium browser with dev.to

But what is async and await? Each puppeteer method is promise and you can use with

const puppeteer = require('puppeteer');

puppeteer
.launch({
  headless: false,
  defaultViewport: null
})
.then(browser => browser.newPage())
.then(page => page.goto('https://dev.to'));
Enter fullscreen mode Exit fullscreen mode

But the first example more comfortable, and I prefer to use it

Find selectors

To find the desired selector, you need to right-click on the element and click "Inspect". This requires basic knowledge of HTML and CSS. But you can use Firefox and extension SelectorsHub

Type and click

Ok, let's steal our IP from Google

await page.goto('https://google.com');
//just visit google.com automatic
await page.waitForSelector('.gLFyf.gsfi');
//wait for element with `.gLFyf.gsfi` selector
//is loaded
await page.type('.gLFyf.gsfi', 'what is my ip');
//type some text on `.gLFyf.gsfi` selector
await page.keyboard.press('Enter');
//press `enter` on page
await page.waitForSelector('span[style="font-size:20px"]');
//wait for element with `span[style="font-size:20px"]`
//selector is loaded
let ip = await page.$eval('span[style="font-size:20px"]', el => el.innerText)
//execude code `el.innerText` on element
//with `span[style="font-size:20px"]` selector
//and put innerText of element in variable
console.log(ip)
await browser.close();
//close browser
Enter fullscreen mode Exit fullscreen mode

Save ip-google.js file and run with node ip-google. Few seconds later you can see your ip in console

Bonus. Understanding (async () => {})()

My first reaction when I saw (async () => {})() was "wtf is this"

function someFunction() {}
//simple
Enter fullscreen mode Exit fullscreen mode

Could it be shorter?

function () {}
//anonymous function
Enter fullscreen mode Exit fullscreen mode

But how to use await in function?

async function () {}
//async function
Enter fullscreen mode Exit fullscreen mode

Could it be shorter?

async () => {}
//arrow function
Enter fullscreen mode Exit fullscreen mode

Inline execute?

(async () => {})()
//execute
Enter fullscreen mode Exit fullscreen mode

This function is asynchronous, allows await, and is executed immediately. That's all

Bonus. Repo with code

All code from this guide hosted on GitHub

Discussion (0)