DEV Community


Posted on

Splitting a PDF file into multiple files using AWS Lambda and Node.js

import fs from 'fs'
import AWS from 'aws-sdk';
import { PDFDocument } from 'pdf-lib'
const s3 = new AWS.S3();

export const handler = async (event) => {
    console.log({ PDFDocument, event })
    const bucket = event.Records[0];
    const key = event.Records[0].s3.object.key;

    console.log("KEy", { bucket, key, record: JSON.stringify(event) })

    const params = {
        Bucket: bucket,
        Key: key
    const pdf = await s3.getObject(params).promise();

    console.log("PDF BODY", { pdf })

    const pdfDoc = await PDFDocument.load(pdf.Body);
    console.log("event", { bucket, key, pdf })
    const promises = [];
    for (let i = 0; i < pdfDoc.getPages().length; i++) {
        // Create a new "sub" document
        const subDocument = await PDFDocument.create();

        // copy the page at current index
        const [copiedPage] = await subDocument.copyPages(pdfDoc, [i])
        const pdfBytes = await

        const splitFile = key.split('.pdf')[0] + '_' + i + '.pdf';
        console.log("SPLIT NAME",{key:key.split('.pdf')[0] + '_' + i + '.pdf',splitFile})
            Key: splitFile,
            Body: pdfBytes

  const response =  await Promise.all(promises);
    return {
        message: 'PDF file split and saved to S3 successfully.'

Enter fullscreen mode Exit fullscreen mode

This is an AWS Lambda function written in Node.js which is used to split a PDF file into multiple PDF files, one for each page of the original file.

The code first imports three libraries:

  • fs (File System): A built-in library in Node.js used to work with the file system of the system on which the code is running. In this code, it is not being used.
  • AWS (Amazon Web Services) from 'aws-sdk': A library for working with Amazon Web Services. In this code, it is used to interact with Amazon S3 (Simple Storage Service), a cloud-based object storage service.
  • PDFDocument from 'pdf-lib': A library for working with PDF (Portable Document Format) files. In this code, it is used to load and manipulate PDF files. Then, an instance of the S3 client is created using new AWS.S3(). This instance will be used to interact with the S3 service.

The main logic of the function is inside the handler function. This function will be called when the AWS Lambda is triggered.

The function starts by logging the event object, which is passed as an argument. This event object contains information about the event that triggered the function. In this case, it should contain information about the S3 object that was just created or updated.

Then, the name of the S3 bucket and the key of the object are extracted from the event object. The key represents the path of the object in the S3 bucket.

The function then uses the s3.getObject method to get the contents of the PDF file from the S3 bucket. The method is passed an object with two properties, Bucket and Key, which represent the name of the S3 bucket and the key of the object, respectively.

Once the contents of the PDF file are retrieved, the function uses the PDFDocument.load method to load the PDF document.

The function then creates an array of promises, where each promise represents a task to split one page of the PDF document into a separate file and save it to S3.

For each page in the PDF document, the function creates a new PDF document using the PDFDocument.create method. Then, it uses the copyPages method to copy the page from the original document to the new document. Finally, it uses the save method to save the new document as a new PDF file. The contents of the new file are then uploaded to S3 using the s3.putObject method.

Once all the promises are fulfilled, the function returns a message indicating that the PDF file has been split and saved to S3 successfully.

Top comments (0)