Apify for Apify

Posted on Nov 15, 2023 • Originally published at blog.apify.com on Aug 22, 2023

How to take screenshots and generate PDFs with Puppeteer

#puppeteer

Having the ability to capture screenshots and generate PDFs automatically is a valuable skill for developers, whether you're web scraping, making reports, testing, monitoring content, or archiving web pages. In this article, you'll learn how to use Puppeteer, a powerful Node.js package, to easily capture screenshots and generate PDFs from web pages.

Let the computer do the work of taking screenshots and generating PDFs with Puppeteer

What you need to get started

All you need is a basic knowledge of Node.js and JavaScript. If you don't have Node.js installed on your computer, head over to the official Node.js website, download and install the latest stable version. You can also follow this guide to help you properly install Node.

Why choose Puppeteer?

Before getting into the technical details, let's have a look at why Puppeteer is such a great tool for capturing screenshots and generating PDFs. Puppeteer is a headless browser library that provides a high-level API for controlling Chrome or Chromium browsers. Because of its versatility and ease of use, it is a popular tool for automating browser tasks.

Puppeteer provides a perfect environment for such jobs, whether you need to navigate web pages, interact with elements, or perform specific actions.

6 reasons to use Puppeteer to take screenshots and make PDFs

Automated testing and quality assurance

Developers can use the Puppeteer screenshots method at various phases of a web application's functionality. These screenshots can be used for automated testing to ensure that user interface elements, layouts, and interactions are consistent across devices and browsers.
Visual regression testing

When making updates to a website, it's critical to avoid unintentional visual modifications. The ability of Puppeteer to take screenshots makes it a fantastic tool for performing visual regression tests. Developers can rapidly spot visual inconsistencies by comparing screenshots before and after changes.
Content monitoring and archiving

Puppeteer allows developers to capture screenshots or produce PDF snapshots of web pages on a regular basis. This is especially useful for content monitoring, tracking changes in online material, and building web page archives for historical purposes.
Reporting and documentation

Puppeteer makes it easier to generate PDF reports from web pages. By capturing screenshots of pertinent data and content, developers can produce informative and visually appealing reports. These PDF reports are useful for presenting insights, analytics, and summary information.
Regulatory sompliance and legal records

Some industries require web content records to be kept for regulatory or legal purposes. Puppeteer can be used to generate PDF screenshots of web pages, ensuring that correct records are kept.
E-commerce and catalog management

Online retailers frequently capture product pages, catalog listings, or shopping carts for a variety of purposes, such as producing marketing materials, checking inventories, or creating PDF catalogs.

The Puppeteer screenshot and Puppeteer PDF generation capabilities provide developers with a robust toolkit for a variety of applications such as testing, monitoring, documentation, and data extraction. Its automation features help to streamline operations, increase efficiency, and improve user experiences across sectors and use cases.

💻 Installing Puppeteer

Let's begin by installing and configuring Puppeteer on your device. To install Puppeteer and configure your environment, follow these steps:

Firstly, you need to create a folder for your project or if it is an existing project, open your terminal and change your working directory to the folder you created to ensure that the necessary package is installed for Puppeteer to work perfectly.

Example: puppeteer-project

mkdir puppeteer-project

cd puppeteer-project

npm init

The command above will create a folder puppeteer-project, and initialize Node into it. Follow the prompt that comes up after the npm init is run. Then, use the command below to install Puppeteer:

npm install puppeteer

Test Installation: verify that Puppeteer is correctly installed by writing a simple script in the project folder. The script will start a browser session, load the blog.apify.com website and close the browser window after three seconds.

Name the file simple.js, and add the code below to it

import puppeteer from "puppeteer"
//launch a new browser instance
const browser = await puppeteer.launch({
    headless: false
    });
//create a new page
const page = await browser.newPage()
//navigate to a sample website
await page.goto('https://blog.apify.com')
//wait for 3 seconds before closing the browser
await page.waitForTimeout(3000)
//close the browser
await browser.close()

To run the script, you can use the code editor terminal, then use this command: node simple.js to run the code.

Note: Steps 2 and 3 should be completed within a folder to ensure that the necessary node package is installed and effectively referenced with your code.

📂Taking screenshots with Puppeteer

Taking screenshots with Puppeteer is easy. Let's go over the steps one by one:

Browser and website instance: Before you can use Puppeteer to perform any action on a site, you need to create a browser instance, load the website you want to work on, then move ahead to take the screensh. As a starting point, create a new file, you can name it screenshot.js, add the code below into it:

 import puppeteer from "puppeteer"

 const browser = await puppeteer.launch({
    headless: false
    });
 const page = await browser.newPage();
 await page.goto('https://blog.apify.com');
  // Continue with screenshot code

The code above will launch a headless chrome browser, it will then create a new page and open blog.apify.com on the new page.

Taking a screenshot: Once you're on the webpage, taking a screenshot requires only one line of code:

await page.screenshot({ path: 'screenshot.jpeg' });

screenshot.jpeg is the filename and extension you want to save the image as. You can use either jpeg, png or webp. One thing to keep in mind is that taking screenshots in jpeg format is faster than in png format.

Note: If no path is specified, the image will not be saved to disk.

Customizing screenshot options

Puppeteer offers various options to customize your screenshot:

Full Page Screenshot: to capture the entire webpage, use the fullPage option. When the option is true, it takes a screenshot of the full scrollable page

await page.screenshot({ path: 'fullpage.png', fullPage: true });

Specified viewport size: you can as well capture a specific section of the page, by defining a viewport size:

await page.setViewport({ width: 800, height: 600 });
await page.screenshot({ path: 'viewport.png' });

For more screenshot options, check out the Puppeteer docs.

Complete code for this section

import puppeteer from "puppeteer"

const browser = await puppeteer.launch({
headless: false
});

const page = await browser.newPage();
await page.goto('https://blog.apify.com');

//screenshot code, fullpage option
// await page.screenshot({ path: 'apify.jpeg', fullPage: true });

//specified viewport
await page.setViewport({ width: 800, height: 600 });
await page.screenshot({ path: 'apifyView.png' });

//close the browser
await browser.close()

To see the full-page screenshot option in action, comment out the viewport lines and uncomment the full-page option.

In this section, you have learned how to make a screenshot of a website and how to adjust the choices available with the screenshot option in Puppeteer. The next part of the article will cover how to make PDF files with Puppeteer.

📕Generating PDFs with Puppeteer

Generating PDFs is another powerful feature of Puppeteer. Let's explore how to convert web pages into PDFs.

Converting web pages to PDFs: Similar to capturing screenshots, you'll start by navigating to the desired web page. Then, use the page.pdf method to generate a PDF:

await page.goto('https://blog.apify.com');
await page.pdf({ path: 'apify.pdf' });

Adjusting PDF options

Puppeteer allows you to customize the PDF output by adjusting various options:

Page format and margins:

await page.pdf({
  path: 'formatted.pdf',
  format: 'A4',
  margin: { top: '40px', right: '20px', bottom: '40px', left: '20px' },
});

For more PDF options, check out the Puppeteer docs.

Complete code for this section

import puppeteer from "puppeteer"

const browser = await puppeteer.launch({
headless: false
});
const page = await browser.newPage();
await page.goto('https://blog.apify.com');

//generate pdf
// await page.pdf({ path: 'page.pdf' });

//format pdf options
await page.pdf({
    path: 'formatted.pdf',
    format: 'A4',
    margin: { top: '40px', right: '20px', bottom: '40px', left: '20px' },
});

//close the browser
await browser.close()

🔑Advanced Puppeteer techniques

As you become more comfortable with Puppeteer, you can explore advanced techniques to enhance your capabilities:

Handling dynamic content Some web pages load content dynamically. Use the waitForSelector function to ensure the content is fully loaded before taking a screenshot or generating a PDF.

await page.waitForSelector('.dynamic-element');

Automating batch jobs To automate capturing screenshots or generating PDFs for multiple web pages, create a loop that iterates through an array of URLs.

For the purpose of this example, create a separate folder called images, all the screenshots will be saved into the folder.

const urlArr = ['https://blog.apify.com', 'https://blog.apify.com/puppeteer-submit-forms', 'https://blog.apify.com/puppeteer-web-scraping-tutorial'];
for(var i = 0; i < urlArr.length; i++){
const site_url = urlArr[i];
// Open URL in current page
await page.goto(site_url, {
waitUntil: 'networkidle0'
});
// Capture screenshot
await page.screenshot({ path: `images/screenshot_${i+1}.png`, fullPage: true });
}

The code above will loop over each URL in the array before taking a full-page snapshot and saving it to the images folder.

Dealing with authentication If a webpage requires basic authentication, use Puppeteer's page.authenticate method before navigating to the page.

//setup your basic authentication credential
const username = 'myUsername';
const password = 'password123';

// set the Authentication credentials
await page.authenticate({
username, password
});
// go to the website where you want to perform Authentication
await page.goto('https://website-url/auth-page');
//perfom further action on the page

Or use the page.type and page.click methods to fill the login form manually.

  await page.goto('https://warehouse-theme-metal.myshopify.com/account/login');

  // Find the input field by its ID selector
    await page.type('input[id*="customer"]', 'demo@username.com', {delay: 100});
  await page.type('input[type=password]', 'demo_password', {delay: 100});
  // click the login button
page.click('.form__submit.button--full')

Troubleshooting and tips

Even the best developers encounter challenges. Here are some troubleshooting tips to help you overcome common issues:

Content not loading:

Ensure you're waiting for the necessary elements to load using waitForSelector or waitForNavigation.
Stale element reference:

If you're interacting with elements before taking a screenshot or generating a PDF, ensure those elements are still valid.

Link to the complete code examples on GitHub

Screenshot.js

Pdf.js

Batch.js

Simple.js

Alternatives to Puppeteer

Great, you've unlocked Puppeteer's potential for taking screenshots and generating PDFs! You now know how to install Puppeteer, capture screenshots, configure options, and convert web pages to PDFs. But is Puppeteer the right choice for you. Check out Playwright vs. Puppeteer and Puppeteer vs. Selenium to find out about two handy alternatives.

Continue learning about Puppeteer

❓FAQ

Can Puppeteer capture screenshots of specific elements on a page?

Yes, Puppeteer allows you to target specific elements by using their selectors. You can then capture screenshots of these individual elements.

How can I capture a screenshot of a dynamically loaded element?

Use the waitForSelector function to wait for the element to be fully loaded before capturing the screenshot.

Is it possible to generate a PDF from a protected web page that requires login?

Yes, you can use Puppeteer's page.authenticate method to provide login credentials before navigating to the protected page.

Can Puppeteer generate PDFs from multiple web pages in a single batch job?

Absolutely! You can create a loop that iterates through an array of URLs, capturing screenshots or generating PDFs for each page.

DEV Community

How to take screenshots and generate PDFs with Puppeteer

What you need to get started

Why choose Puppeteer?

6 reasons to use Puppeteer to take screenshots and make PDFs

💻 Installing Puppeteer

📂Taking screenshots with Puppeteer

Customizing screenshot options

Complete code for this section

📕Generating PDFs with Puppeteer

Adjusting PDF options

Complete code for this section

🔑Advanced Puppeteer techniques

Troubleshooting and tips

Link to the complete code examples on GitHub

Alternatives to Puppeteer

Continue learning about Puppeteer

❓FAQ

Can Puppeteer capture screenshots of specific elements on a page?

How can I capture a screenshot of a dynamically loaded element?

Is it possible to generate a PDF from a protected web page that requires login?

Can Puppeteer generate PDFs from multiple web pages in a single batch job?

Top comments (0)

Read next

Where is Java Used in Industry?

Building an App with Debezium

Find and Fix N+1 Queries in Django Using AppSignal

Configuring Reverb in Laravel 11 with Apache