Dmytro Krasun

Posted on Jan 5, 2022 • Edited on Jan 16, 2022 • Originally published at screenshotone.com

How to take a screenshot with Puppeteer

#javascript #node

Making screenshots of the websites with Puppeteer can be tricky. A lot of pitfalls wait for us. Let's examine Puppeteer on a set of "screenshotting" problems and tackle arising pitfalls.

I posted worked Puppeteer examples to understand the context of the solution better and copy it if needed.

Meet Puppeteer

It is a Node library that interacts with browsers that support Chrome DevTools Protocol (CDP). It is not only Chrome and Chromium, but Firefox also has partial support of CDP.

The Chrome DevTools Protocol was developed to manage, debug and inspect Chromium and Chrome at the low level.

So, think of Puppeteer high-level API over Chrome DevTools Protocol which allows you to do everything in the browser that you can do manually:

Extract data from a SPA, submit a form, type text, perform end-to-end UI testing and other automation-related tasks.
Debug performance issues.
Run, debug and test Chrome Extensions.
Pre-render SPA to make a static site. But for Google SEO, it does not matter since Google renders JavaScript for every page nowadays.
And guess what? Make screenshots and PDFs of pages.

Generating Screenshots and PDFs with Puppeteer is the main focus of the post.

Puppeteer architecture and internals for curious

You can skip this section. It is not required to start using the library. But I love to explore the internals of the libraries I use, and so might you.

Lightweight option of Puppeteer

First of all, there are two versions of the library available: puppeteer-core and puppeteer. You should use puppeteer-core when you are going to manage browser instances by yourself, or you do not need it, otherwise stick to puppeteer.

Three simple examples that come to my mind with puppeteer-core:

You are using CDP from the extension, so you do not have to download Chrome or Chromium.
You want to use a different Chrome, Chromium, or Firefox build.
You have a running cluster of browsers or a separate browser instance on an other machine.

When you use puppeteer-core, you must ensure that you use a compatible browser version. But the puppeteer library downloads and runs a compatible version of Chromium instance for you, without any worries.

Puppeteer Alternatives

There are a lot more, but the most popular two are:

The oldest alternative to make screenshots is using the Selenium WebDriver protocol.
The second one is Playwright, and it is a good one. It is the competitor to the Puppeteer.

Playwright and Puppeteer have compatible API, but Playwright supports more browsers. So, if you must take screenshots in different browsers, prefer to use Playwright. By the way, top contributors of the Puppeteer work on Playwright. But the library is still considered new.

Practical Examples of using Puppeteer to take screenshots

Before starting to work with Puppeteer, let's install it using npm:

$ npm i puppeteer

A simple screenshot

To take a simple screenshot with Puppeteer and save it into the file, you can use the following code:

'use strict';

const puppeteer = require('puppeteer');

(async () => {
       const browser = await puppeteer.launch();

       try {
           const page = await browser.newPage();
           await page.goto('https://example.com');
           await page.screenshot({ path: 'example.png' });
       } catch (e) {
           console.log(e)
       } finally {
           await browser.close();
       }
})();

Always close the browser to avoid resource leaking.

Resolution and Retina Display

To avoid blurred images on a high-resolution display like Retina Display you can change the viewport properties width, height and deviceScaleFactor:

'use strict';

const puppeteer = require('puppeteer');

(async () => {
       const browser = await puppeteer.launch();

       try {
           const page = await browser.newPage();

           await page.setViewport({
               width: 2880, // default: 800
               height: 1800, // default: 600 
               deviceScaleFactor: 2 // default: 1
           });

           await page.goto('https://apple.com');
           await page.screenshot({ path: 'apple.com.png' });
       } catch (e) {
           console.log(e)
       } finally {
           await browser.close();
       }
})();

That's called pixel-perfect screenshots.

A full page screenshot

Puppeteer knows how to make screenshot of the scrollable page. Use fullPage option:

'use strict';

const puppeteer = require('puppeteer');

(async () => {
       const browser = await puppeteer.launch();

       try {
           const page = await browser.newPage();
           await page.goto('https://apple.com');
           await page.screenshot({ path: 'apple.com.png', fullPage: true });
       } catch (e) {
           console.log(e)
       } finally {
           await browser.close();
       }
})();

But it won't work with "infinite" scroll.

A full page screenshot with "infinite" scroll

It is out of the scope for the article, but it is hard to find the case when you need to make screenshots with "infinite" scroll sites. And if you need to, you can use the next algorithm:

Load the page, wait until it is loaded.
Scrolling until there the size of the page is not changed.
Take the screenshot.

If you try to do it with Twitter or Instagram for account that has a lot of posts, you absolutely will end up with crashed browser instance due to the memory exhaustion.

Wait until the page is completely loaded

It is a good practice to wait until the page is completely loaded to make screenshot:

'use strict';

const puppeteer = require('puppeteer');

(async () => {
       const browser = await puppeteer.launch({});

       try {
           const page = await browser.newPage();

           await page.goto('https://apple.com/', {
               waitUntil: 'networkidle0',
           });

           await page.screenshot({ path: 'apple.com.png' });
       } catch (e) {
           console.log(e)
       } finally {
           await browser.close();
       }
})();

It is a little bit of magic, but networkidle0 event is heuristic to determine page load state. The Puppeteer team finds it working quite well for many real-world use cases.

But if you need to wait until some element is rendered and visible, you need to add Page.waitForSelector():

'use strict';

const puppeteer = require('puppeteer');

(async () => {
       const browser = await puppeteer.launch({});

       try {
           const page = await browser.newPage();

           await page.goto('https://example.com/', {
               waitUntil: 'networkidle0',
           });

           const selector = 'div';
           await page.waitForSelector(selector, {
               visible: true,
           });

           await page.screenshot({ path: 'example.com.png' });
       } catch (e) {
           console.log(e)
       } finally {
           await browser.close();
       }
})();

You can also wait:

A screenshot of the page area

To take the screenshot of the page area, use the clip option:

'use strict';

const puppeteer = require('puppeteer');

(async () => {
       const browser = await puppeteer.launch();

       try {
           const page = await browser.newPage();
           await page.goto('https://apple.com');
           await page.screenshot({
               path: 'apple.com.png',
               clip: {
                   x: 100,
                   y: 100,
                   width: 800,
                   height: 800
               },
           });
       } catch (e) {
           console.log(e)
       } finally {
           await browser.close();
       }
})();

But if you need to take a screenshot of the element, there is a better approach.

A screenshot of the specific element

Puppeteer allows to take the screenshot of any element on the web page:

'use strict';

const puppeteer = require('puppeteer');

(async () => {
       const browser = await puppeteer.launch();

       try {
           const page = await browser.newPage();
           await page.goto('https://example.com');

           const selector = 'body > div:first-child';
           await page.waitForSelector(selector);
           const element = await page.$(selector); 

           await element.screenshot({
               path: 'example.com.png',            
           });
       } catch (e) {
           console.log(e)
       } finally {
           await browser.close();
       }
})();

As you see, it is essential to make sure that the element is ready.

A screenshot with transparent background

Puppeteer provides a useful option to omit the background of the site. Just set omitBackground to true:

'use strict';

const puppeteer = require('puppeteer');

(async () => {
       const browser = await puppeteer.launch();

       try {
           const page = await browser.newPage();
           await page.goto('https://example.com');

           await page.screenshot({
               path: 'example.com.png',
               omitBackground: true,            
           });
       } catch (e) {
           console.log(e)
       } finally {
           await browser.close();
       }
})();

Have you run the code? If yes, you spotted that the screenshot does not have a transparent background. It happens because omitting background works only for elements with transparent background.

So if your target site does not have a transparent background and you want to force it, you can use JavaScript to accomplish the task. Change the background of the body in the evaluate function:

'use strict';

const puppeteer = require('puppeteer');

(async () => {
       const browser = await puppeteer.launch();

       try {
           const page = await browser.newPage();
           await page.goto('https://example.com');

           await page.evaluate(() => {            
               document.body.style.background = 'transparent';
           });

           await page.screenshot({
               path: 'example.com.png',
               omitBackground: true,            
           });
       } catch (e) {
           console.log(e)
       } finally {
           await browser.close();
       }
})();

Screenshot as Base64

You build Puppeteer as a service and do not want to store screenshot files. You can choose to return the screenshot in Base64 encoding format:

'use strict';  

const puppeteer = require('puppeteer');  

(async () => {  
   const browser = await puppeteer.launch({});  

   try {  
       const page = await browser.newPage();  
       await page.goto('https://example.com/');  

       const base64 = await page.screenshot({ encoding: "base64" })  
       console.log(base64);  
   } catch (e) {  
       console.log(e)  
   } finally {  
       await browser.close();  
   }  
})();

You will receive a string that you can share with another service or even store somewhere.

Generate PDF instead of PNG

It is relatively easy to generate PDF instead of PNG:

'use strict';

const puppeteer = require('puppeteer');

(async () => {
       const browser = await puppeteer.launch({});

       try {
           const page = await browser.newPage();

           await page.goto('https://example.com/', {
               waitUntil: 'networkidle0',
           });

           const selector = 'div';
           await page.waitForSelector(selector, {
               visible: true,
           });

           await page.pdf({path: 'example.com.pdf', format: 'a4'})        
       } catch (e) {
           console.log(e)
       } finally {
           await browser.close();
       }
})();

Look at all possible Puppeteer PDF options. It is an exciting and complex problem, which deserves a separate post.

It depends on your use case, but also consider using PDFKit for programmatic PDF generation.

Blocking ads when using Puppeteer

I do not use any ad blocking extension because life is tough, and everybody needs some way to earn money. If I can help sites sustain and survive by non-blocking the ads, I will do it.

But when you test your site or your customer site, you might need to block the ads. There are 2 ways to do it:

Intercept and block request that load ad into the site.
Use an extension that is optimized exactly to solve this problem.

The first one is tricky and highly depends on the site you are taking screenshots of. But using an extension is a highly-scalable approach that works out of the box.

Install puppeteer-extra and puppeteer-extra-plugin-adblocker in addition to puppeteer package:

$ npm i puppeteer-extra puppeteer-extra-plugin-adblocker

And then use it:

'use strict';

const puppeteer = require('puppeteer-extra');

const AdblockerPlugin = require('puppeteer-extra-plugin-adblocker');
puppeteer.use(AdblockerPlugin());

(async () => {
       const browser = await puppeteer.launch();

       try {
           const page = await browser.newPage();

           // ads are blocked automatically
           await page.goto('https://www.example.com');

           await page.screenshot({
               path: 'example.com.png',
               fullPage: true,
           });
       } catch (e) {
           console.log(e)
       } finally {
           await browser.close();
       }
})();

Most pages include ads and trackers, which consume a lot of bandwidth and take a long time to load. Because fewer requests are made, and less JavaScript is performed when advertisements and trackers are blocked, pages load substantially quicker.

Block trackers

To take screenshots faster you might block trackers. It will help to speed up rendering. The ad blocking plugin can help us with this issue.

Do not forget to install puppeteer-extra and puppeteer-extra-plugin-adblocker in addition to puppeteer package:

$ npm i puppeteer-extra puppeteer-extra-plugin-adblocker

And then use it:

'use strict';

const puppeteer = require('puppeteer-extra');

const AdblockerPlugin = require('puppeteer-extra-plugin-adblocker');
puppeteer.use(AdblockerPlugin({
       blockTrackers: true, // default: false
}));

(async () => {
       const browser = await puppeteer.launch();

       try {
           const page = await browser.newPage();

           // ads are blocked automatically
           await page.goto('https://www.example.com');

           await page.screenshot({
               path: 'example.com.png',
               fullPage: true,
           });
       } catch (e) {
           console.log(e)
       } finally {
           await browser.close();
       }
})();

If you need to block only trackers, but do not block ads, just use request interceptor.

Preventing Puppeteer detection

Some sites might block your Puppeteer script because of the user agent, and it is easy to fix:

'use strict';

const puppeteer = require('puppeteer');

(async () => {    
       const options = {
           args: [
               '--user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"'
           ],
           headless: true,
       };

       const browser = await puppeteer.launch(options);
       try {
           const page = await browser.newPage();
           await page.goto('https://www.example.com');

           await page.screenshot({
               path: 'example.com.png',
               fullPage: true,
           });
       } catch (e) {
           console.log(e)
       } finally {
           await browser.close();
       }
})();

There are also many other hacks to ensure that Puppeteer is not detected, but you can save time by using the ready puppeteer-extra-plugin-stealth plugin for the stealth mode. Install it in addition to puppeteer package:

$ npm i puppeteer-extra puppeteer-extra-plugin-stealth

And then use:

'use strict';

const puppeteer = require('puppeteer-extra');

const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());

(async () => {
       const browser = await puppeteer.launch();

       try {
           const page = await browser.newPage();        

           await page.evaluateOnNewDocument(() => {
               const newProto = navigator.__proto__;
               delete newProto.webdriver;
               navigator.__proto__ = newProto;
           });

           await page.goto('https://bot.sannysoft.com');        
           await page.waitForTimeout(5000);
           await page.screenshot({ path: 'stealth.png', fullPage: true });

       } catch (e) {
           console.log(e)
       } finally {
           await browser.close();
       }
})();

Important! As you see, I remove the webdriver property since the stealth plugin misses this hack and by using webdriver property usage of the Puppeteer can be detected.

Hide cookies banners

It is a tricky task to implement generically, but you can accept a cookie by finding the selector of the Accept or reject button and clicking on it.

Using basic access authentication with Puppeteer

If your page is protected by HTTP basic access authentication, the only thing you need to do is to specify username and password before loading and taking the screenshot of the page:

'use strict';

const puppeteer = require('puppeteer');

(async () => {
       const browser = await puppeteer.launch();

       try {
           const page = await browser.newPage();

           await page.authenticate({'username':'YOUR_BASIC_AUTH_USERNAME', 'password': 'YOUR_BASIC_AUTH_PASSWORD'});

           await page.goto('https://example.com');
           await page.screenshot({ path: 'example.png' });
       } catch (e) {
           console.log(e)
       } finally {
           await browser.close();
       }
})();

Using a proxy for Puppeteer

In case if you need to use a proxy to make a screenshot with Puppeteer, you can specify a browser-wide proxy:

const puppeteer = require('puppeteer');

(async () => {
       const browser = await puppeteer.launch({
           args: ['--proxy-server=127.0.0.1:9876']
       });

       try {
           const page = await browser.newPage();

           await page.goto('https://example.com/', {
               waitUntil: 'networkidle0',
           });

           await page.screenshot({ path: 'example.com.png' })
       } catch (e) {
           console.log(e)
       } finally {
           await browser.close();
       }
})();

But in some cases, you might want to use a page-wide proxy without recreating the browser instance. In this case, you can install puppeteer-page-proxy:

npm i puppeteer-page-proxy

And use it specify proxy on per-page basis:

const puppeteer = require('puppeteer');
const useProxy = require('puppeteer-page-proxy');

(async () => {
       const browser = await puppeteer.launch({});

       try {
           const page = await browser.newPage();

           useProxy(page, '127.0.0.1:9876')

           await page.goto('https://example.com/', {
               waitUntil: 'networkidle0',
           });

           await page.screenshot({ path: 'example.com.png' })
       } catch (e) {
           console.log(e)
       } finally {
           await browser.close();
       }
})();

Add support of emojis, Japanese, Arabic and other non-Latin languages to Puppeteer

If you run Puppeteer in OS without emojis support, you need to install OS-wide fonts to support emojis. The same can happen with non-English characters like Chinese, Japanese, Korean, Arabic, Hebrew, etc.

To get Puppeteer to render emojis, you can use Noto Fonts published under SIL Open Font License (OFL) v1.1.

You need to search and how to install fonts for your host OS.

Have a nice day 👋

I posted a lot of Puppeteer examples, and I hope I helped you solve your screenshot problems with Puppeteer. I described every problem I encountered and the solution to it.