loading...

Generate a PDF in AWS Lambda with NodeJS and Puppeteer

akirautio profile image Aki Rautio ・3 min read

Recently I have needed to solve a problem that involves generating a PDF file based on database content. Since these PDFs are not generated too often, it doesn't make sense to 24/7 running service. Luckily both Google (Functions) and AWS (Lambda) have an event-driven service which is only running on request.

Originally I was planning to use Python and a Reportlab for this project but a connection to PostgreSQL database ended up being too complex to configure. With NodeJS I had already done a small project with database connection so I knew that it would work.

For NodeJS I still needed a package to generator PDF, and I found following options:

I ended up choosing Puppeteer for this project. It's a bit overkill for the current use case but at the same time, it is more future proof due to html+css base structure.

To make my life easier I'm using a serverless package to handle deployment to AWS Lambda and chrome-aws-lambda to help out the deployment of puppeteer to AWS Lambda. Full list of required dependencies are the following:

"dependencies": {
  "chrome-aws-lambda": "1.18.1",
  "knex": "0.18.3",
  "pg": "7.11.0",
  "pg-hstore": "2.3.2",
  "pug": "2.0.4",
  "puppeteer-core": "1.18.1",
}
"devDependencies": {
    "serverless": "1.40.0",
    "serverless-apigw-binary": "0.4.4",
    "serverless-offline": "4.9.4",
  }

Aside from the main requirements, I'm using knex, pg, and pg-hstore to handle database connection and pug as a template engine. For local testing I'm using serverless-offline and to help the binary addition to lambda, I'm using serverless-apigw-binary.

Creating a lambda function

The process of creating a pdf goes following:

  1. Fetch the data which we will use to create report (in my case from db with knex)
  2. Create a html template which will be comined with the data (I'm using pug in here).
  3. Load puppeteer and open html file with puppeteer.
  4. Generate a pdf page with puppeteer.
  5. Return PDF as a base64 string.
'use strict'
const chromium = require('chrome-aws-lambda')
const pug = require('pug')
const fs = require('fs')
const path = require('path')

const knex = require('./src/db')

module.exports.pdf = async (event, context) => {
  const yearMonth = ((event || {}).pathParameters || {}).yearMonth || ''
  const year = yearMonth.length == 7 && yearMonth.substring(0, 4)
  const month = yearMonth.length == 7 && yearMonth.substring(5, 6)

  // Select a date
  const selDate = new Date(year, month)
  const filter = {
    month: selDate.toLocaleString('en', { month: 'long' }),
    year: selDate.getFullYear()
  }


  // 1. Load database data wiht Knex TODO
  const result = await knex
    .select()
    .from('sales')
    .where({
      year: filter.year,
      month: selDate.getMonth() + 1
    })

  // 2. Create html
  const template = pug.compileFile('./src/template.pug')
  const html = template({ ...filter, result })

  // 3. Open puppeteer
  let browser = null
  try {
    browser = await chromium.puppeteer.launch({
      args: chromium.args,
      defaultViewport: chromium.defaultViewport,
      executablePath: await chromium.executablePath,
      headless: chromium.headless
    })

    const page = await browser.newPage()
    page.setContent(html)

    // 4. Create pdf file with puppeteer
    const pdf = await page.pdf({
      format: 'A4',
      printBackground: true,
      margin: { top: '1cm', right: '1cm', bottom: '1cm', left: '1cm' }
    })

    // 5. Return PDf as base64 string
    const response = {
      headers: {
        'Content-type': 'application/pdf',
        'content-disposition': 'attachment; filename=test.pdf'
      },
      statusCode: 200,
      body: pdf.toString('base64'),
      isBase64Encoded: true
    }
    context.succeed(response)
  } catch (error) {
    return context.fail(error)
  } finally {
    if (browser !== null) {
      await browser.close()
    }
  }
}

Deployment to AWS lambda

As earlier said, we are using Serverless for deployment so that the configuration is not too heavy.

service:
  name: PDF

plugins:
  - serverless-offline
  - serverless-apigw-binary

provider:
  name: aws
  runtime: nodejs8.10
  region: eu-central-1
  stage: ${opt:stage, 'development'}
  environment:
    ENV: ${self:provider.stage}

custom:
  apigwBinary:
    types:
      - '*/*'

functions:
  pdf:
    handler: pdf.pdf
    events:
      - http:
          path: pdf
          method: get
          cors: true

The keys in here are that we enable / for apigwBinary so that PDF goes through in a correct format.

And here we have everything to generate PDF in AWS lambda. To my opinion generating the pdf with 1024 MB took something like 4000ms which would mean that total price would be close to 1 euro per 20000 PDF generations after free tier.

If you want to try it out yourself, I have created a repository to Github.

Discussion

pic
Editor guide
Collapse
egatjens profile image
Esteban Gatjens

Hi,
Thanks for the article.
I have an issue where the pdf comes empty randomly, to solve it setContent should wait for everything to be loaded.

await page.setContent(html, { waitUntil: ['load', 'domcontentloaded', 'networkidle0'] });
Enter fullscreen mode Exit fullscreen mode
Collapse
akirautio profile image
Aki Rautio Author

Thanks :) Very good point and interesting find. I haven't seen this when loading html as string but this very well can happen.

WaitUntil is very good to use and even necessary if page itself loads external content.

Collapse
charanjitsingh profile image
Charanjit Singh

I have read about named @page rule in css, but It is not working, any idea? Why? i want to make mixture of landscape and portrait pages.

Collapse
akirautio profile image
Aki Rautio Author

Any chance you could share your CSS? I haven't tried exactly this kind of a scenario but it could be that puppeteer has some limitation regarding css.

Collapse
charanjitsingh profile image
Charanjit Singh
      @page :first {
          display: none;
      }    
      @page { size : portrait }
      @page rotated { size : landscape }
      h3 { page : rotated }

      p, h3 {
        page-break-after: always;
      }
    </style>
    <p>First Page.</p>
    <h3>Hello world</h3>
  <p>Second Page.</p>
  <button>Print!</button>```

Collapse
castilh0s profile image
Henrique de Castilhos

Hi! Thanks for the article, it helped me a lot, I didn't know chrome-aws-lambda and it was frustrating me horrors, I couldn't use serverless puppeteer at all and I finally got it now.

Collapse
invious profile image
Aymon Fournier

Error: Chromium revision is not downloaded. Run "npm install" which, doesn't work

Collapse
akirautio profile image
Aki Rautio Author

Is this happening on aws or when running locally?

Collapse
invious profile image
Aymon Fournier

Locally. Works fine on lambda, how do I run it locally?

Also thank you very much for this code, saved me 5 days of work

Thread Thread
akirautio profile image
Aki Rautio Author

The original package that we are using in here is suggesting following: github.com/alixaxel/chrome-aws-lam...

Collapse
saurabh147sharma profile image
Saurabh Sharma

Hi,
This article was very helpful to me.
The pdf that you created, I wanted to store that pdf at S3 bucket because it's in base64, Can you guide me how can I do that?

Collapse
akirautio profile image
Aki Rautio Author

I haven't done this with PDF but I have another lambda function save files which I save this way. Though you could also put the PDF variable straight to body before turning it to base64 and that should work.

const s3 = new AWS.S3()

s3.upload(
{
Body: Buffer.from(pdfBase64, 'base64'),
Bucket: BUCKET,
Key: 'path for the file'
},
(err, data) => {
if (err) {
callback(err, null)
} else {
callback(null, {
statusCode: 200,
body: true,
isBase64Encoded: false
})
}
}
)

Collapse
aviban profile image
aviban

"errorMessage": "Cannot find module 'iltorb'"

Collapse
akirautio profile image