Apify for Apify

Posted on Apr 2, 2024 • Originally published at blog.apify.com on Nov 21, 2023

How to download and upload files in Puppeteer

#puppeteer #node #automation

Hey, we're Apify , the only full-stack web scraping and automation library. Check out some of our easy to use web scraping code templates if you want to get started on building your own Puppeteer scrapers.

Puppeteer is a Node.js library that allows you to interact with headless and headful Chrome browsers. It enables you to perform lots of tasks, such as navigating web pages, taking screenshots, generating PDFs, and handling filesboth downloads and uploads. Puppeteer essentially allows you to automate tasks that would typically require manual intervention in a web browser.

Why handle file downloads and uploads?

File downloads and uploads are common activities when automating web interactions. Whether you're scraping data from websites, testing file upload functionality, or automating document retrieval, Puppeteer's ability to download and upload files makes it an invaluable tool.

Setting up Puppeteer

To set up Puppeteer, you need to have Node.js installed on your computer. If you haven't installed it yet, you can follow this guide for a step-by-step procedure on how to do it.

Once Node.js is installed, it will also install npm for you. You can verify this by opening your command prompt (CMD) or your terminal and using this command:

node -v && npm -v

The output will look like the image below:

Node and NPM version installed on a computer

After that, you can go to your desired location on your computer to create a folder (either desktop or documents) for your project. You'll initialize a new Node.js project in the folder you created using npm init, which will bring up some prompts. Follow through with the prompts, and that will set you up.

Installing Node.js

In the image above, a folder called puppeteer-download-upload is created. The cd puppeteer-download-upload command is used to change your directory into the folder, and the npm init command is used to initialize Node.js into the folder. The prompts came up and were filled in accordingly.

This is so you can run Node.js operations within the project.

The next step is for you to install Puppeteer.

In the same project folder, open your terminal, change your directory, and run this command:

npm install puppeteer

📒Note: Anytime you're working on a new Puppeteer project, you'll have to perform these operations:

Create a new project folder

Initialize Node into the folder

Install Puppeteer

With Puppeteer successfully installed, you're ready to start automating.

Performing download operations with Puppeteer

To perform a download operation with Puppeteer, you'll need a method to trigger the download action, specify the path at which you want the file to be downloaded, and finally take the download action. After deciding on the method, specify the download path and then trigger the download action by navigating to the page or link and clicking the download link.

There are various methods and approaches you can use to perform download operations with Puppeteer. Here are three of them:

Intercepting network requests using page.setRequestInterception(true). You can use this to detect a request for a file download based on the content. If you want to learn more about this method, you should read about request interception in the Puppeteer documentation.

await page.setRequestInterception(true);page.on('request', (interceptedRequest) => { // Check the URL or content type to detect a download request if (interceptedRequest.url().endsWith('.pdf')) || interceptedRequest.url().endsWith('.jpg') { // Handle the download here... } interceptedRequest.continue();});// If the file download doesn't start right away, click a button to trigger itawait page.click('#downloadButton')

Browser Contexts and setDownloadBehavior. This is a more direct way to handle downloads using Puppeteer. In this case, when the download is triggered, the file will automatically be downloaded to the specified directory or path. The previous version used the page._client private API, but it was deprecated. Instead, you should create your own CDP sessions for access to the Chrome dev protocol directly, like so:

const browser = await puppeteer.launch({ headless: false });const page = await browser.newPage()const client = await page.target().createCDPSession()await client.send('Page.setDownloadBehavior', { behavior: 'allow', downloadPath: '/path/to/save/downloads' });// If the file download doesn't start right away, click a button to trigger itawait page.click('#downloadButton')

Each of the methods listed above is suited to different download scenarios. Just choose the one that best fits the project or task you're working on.

Tips for handling errors

Handle timeout within your code to give more time for the download operation to be completed.
To ensure the file is downloaded successfully, you can use the [page.waitForResponse](<https://pptr.dev/api/puppeteer.page.waitforresponse>) and targetcreated conditions.
You can also check the specified directory manually to ensure the file exists and is of the expected size.

Uploading files with Puppeteer

To perform an upload operation with Puppeteer, you'll need a method that lets you perform the file selection option, specify the path at which you want the file to be selected from, and then finally take the upload action.

You can use the elementHandle.uploadFile(...path) method that allows you to upload a file by providing the path, or you can use the fileChooser method.

The FileChooser works when the file chooser has a dialog while elementHadle.uploadFile works directly with the file input element. The method you use depends on the scenario you're working with. If the webpage has a custom button or hides the original file input, FileChooser is advisable. If you're dealing with a standard file input, elementHandle.uploadFile is a better option.

//the code sample of FileChooserconst [fileChooser] = await Promise.all([page.waitForFileChooser(), page.click('#customUploadButton')]);await fileChooser.accept(['/path/file.jpg']);

//code sample for ElementHandle.uploadFileconst UploadElement = await page.$('input[type="file"]');await uploadElement.uploadFile('/path/file.jpg');

After deciding on the method, specify the path and then trigger the upload action (navigate to the page or link, click the upload button, and submit).

Best practices for secure file uploads

Check the file type required for upload and make sure you're uploading the correct file type.
Check and verify the file input and button selectors.
Monitor and handle network requests accordingly.

Examples of using upload and download

In this section, you'll be trying out the upload and download options available in Puppeteer.

1. Automating a file upload with Puppeteer

In the code below, you'll set up your Puppeteer project as explained earlier, create a new file called upload.js, import Puppeteer, then add a test pdf file in the root folder for the purpose of the example.

import puppeteer from "puppeteer";// function to handle timeout for every action to be completedfunction delay(time) { return new Promise(resolve => setTimeout(resolve, time));} const browser = await puppeteer.launch({ headless: false }); const page = await browser.newPage(); await page.goto('<https://easyupload.io/>'); await page.waitForSelector('input[type=file]'); const inputUploadHandle = await page.$('input[type=file]'); // path to the file you want to upload await inputUploadHandle.uploadFile('./testdoc.pdf'); await page.click('#upload'); // Introduce a timeout if necessary (in case of internet speed) await delay(20000); // Wait for a success message. await page.waitForSelector('.upload-success'); await browser.close();

2. Automating the download of the file you uploaded earlier

In the previous example, you uploaded a file to [easyupload.io](<http://easyupload.io>), after which you were given a download link. Copy the download link to the file, and replace it with the URL in page.goto('...'). Also, create a folder named downloads. This is where your file will be downloaded.

import puppeteer from 'puppeteer'import * as fs from 'fs';//function to handle timeout function delay(time) { return new Promise(resolve => setTimeout(resolve, time)); } const browser = await puppeteer.launch({ headless: false }); const page = await browser.newPage(); // Set download behavior const client = await page.target().createCDPSession() await client.send('Page.setDownloadBehavior', { behavior: 'allow', // the download path can be set to a folder in your project root downloadPath: './downloads' }); // Navigate to the download page. (change the download URL) await page.goto('<https://easyupload.io/x2na1r>'); // Download the file. await page.click('#hd1'); // Wait for the download to complete. Adjust this based on your network speed. await delay(10000);; //check the download folder to know if the downloaded file exists there if (fs.existsSync('./downloads/testdoc.pdf')) { console.log('file downloaded successfully!'); } else { console.log('Download failed.'); } await browser.close();

3. Another code sample for upload and download actions with Puppeteer

import puppeteer from 'puppeteer'//function to handle timeoutfunction delay(time) { return new Promise(resolve => setTimeout(resolve, time));} const browser = await puppeteer.launch({ headless: false }); const page = await browser.newPage(); const client = await page.target().createCDPSession(); await client.send('Page.setDownloadBehavior', { behavior: 'allow', downloadPath: './downloads' }); await page.goto('<https://imgur.com/upload>'); const uploadSelector = '#file-input'; await page.waitForSelector(uploadSelector); const inputUploadHandle = await page.$(uploadSelector); await inputUploadHandle.uploadFile('./cap.jpeg'); //wait for the upload to be completed await delay(10000); // initiate the download process and click the download button const downloadLinkSelector = '.upload-download'; await page.waitForSelector(downloadLinkSelector); await page.click(downloadLinkSelector); //wait for the file download to await delay(10000); await browser.close();

You can find some more examples of downloading files in Puppeteer and submitting a form with an attachment in the Apify docs.