DEV Community

loading...

Generate a PDF in AWS Lambda with NodeJS, Webpack, Pug and Puppeteer

zeka profile image Zeljko Krsic Updated on ・6 min read

Last week team I am working in got a task to create possibility for users to get data extracted to PDF file on click on a button. The issue we had with resolving this task was that the data we needed resides on a third-party backend and it was impossible to get any update regarding this in short time.

The fastest and the easiest way was to gather all necessary data directly from frontend and generate PDF from it. On the other hand, we needed something that we can reuse or extend for other purposes. AWS Lambda seemed to be a perfect fit for this. Since I couldn't find a solution for combo I needed, I decided to write an article about it.

My first idea was to write this in plain JavaScript (and I’ve also done that), but since serverless framework had a TypeScript template, I decided to go with it. And it was a good decision because difference in code size was 984,9 kB (TS project) vs. 124,8 MB (JS project) for the same code.

I achieved this in a few steps:

  1. Set up computer and environment
  2. Set serverless.yml
  3. Get all necessary dependencies
  4. Write function and template
  5. Configure Webpack

1. Set up computer and environment and project structure

First step was to set up my computer to deploy code to AWS, since this was the very first project I did that included Lambdas. I am using serverless framework, but I will not write about setting up that framework. I followed instructions I found on Google and it is pretty straightforward.

After I got everything set up and ready, it was time to set configuration in serverless file. Again, I am not showing an entire file, just some important parts. And most of this setup came from serverless template.

Project structure or project tree is as follows

aws-nodejs-ts-pdf
|- serverless.yml
|- package.json
|- webpack.config.json
|- layers
   |- chrome-aws-lambda.zip
|- functions
   |- pdf.ts
|- template
   |- pdfTemplate.pug
|- tsconfig.json
Enter fullscreen mode Exit fullscreen mode

2. Set serverless.yml

Under provider set region, stage and profile and in environment IS_OFFLINE. This last one is needed for local development.

provider
  ...
  region: [YOUR_REGION_HERE]
  stage: [dev|prod]
  profile: [SERVERLESS_PROFILE]
  ...
  environment:
    IS_OFFLINE: ${opt:offline}
    ...
Enter fullscreen mode Exit fullscreen mode

Add to plugins two plugins more, like this

plugins:
  ...
  - serverless-offline
  - serverless-apigw-binary
Enter fullscreen mode Exit fullscreen mode

Those plugins are needed for local development and to configure AWS API Gateway which is shown in next snippet.

custom:
  ...
  serverless-offline:
    location: .webpack/service
  apigwBinary:
    types:
      - '*/*'
Enter fullscreen mode Exit fullscreen mode

We need to set apigwBinary type, otherwise it won't work.

Now I had to set up function which will trigger when someone hit the url and this is set in function block of our serverless.yml file. I'll call it generate-pdf. Here is also reference to a layer. You can read more about layers after next block of code.

functions:
  generate-pdf:
    handler: functions/pdf.generate
    layers:
      - { Ref: HeadlessChromeLambdaLayer }
    events:
      - http:
          method: get
          path: pdf/generate/{typeId}
          cors: true
      - http:
          method: get
          path: pdf/generate
          cors: true
Enter fullscreen mode Exit fullscreen mode

I was also using layers in this project. Since headless Chromium is used here and at some point we will have more functions and all of them could utilise this same layer. Instructions on how to get Chrome binary for AWS can be found here

layers:
  HeadlessChrome:
    name: HeadlessChrome
    compatibleRuntimes:
      - nodejs12.x
    description: Required for headless chrome
    package:
      artifact: layers/chrome_aws_lambda.zip
Enter fullscreen mode Exit fullscreen mode

3. Get all necessary dependencies

After some research I ended up using next dependencies in this project

  "dependencies": {
    "@types/aws-lambda": "8.10.39",
    "@middy/core": "1.0.0-beta.2",
    "@middy/do-not-wait-for-empty-event-loop": "1.0.0-beta.2",
    "aws-lambda": "1.0.5",
    "chrome-aws-lambda": "2.0.2",
    "html-loader": "0.5.5",
    "html-webpack-plugin": "^3.2.0",
    "pug": "2.0.4",
    "pug-loader": "2.4.0",
    "puppeteer-core": "2.0.0",
    "source-map-support": "0.5.16"
  },
  "devDependencies": {
    "@types/node": "13.1.8",
    "@types/pug": "2.0.4",
    "copy-webpack-plugin": "5.1.1",
    "fork-ts-checker-webpack-plugin": "4.0.1",
    "puppeteer": "2.0.0",
    "serverless": "1.61.2",
    "serverless-apigw-binary": "0.4.4",
    "serverless-offline": "5.12.1",
    "serverless-webpack": "5.3.1",
    "ts-loader": "6.2.1",
    "typescript": "3.7.5",
    "webpack": "4.41.5",
    "webpack-node-externals": "1.7.2"
  }
Enter fullscreen mode Exit fullscreen mode

Another important thing from package.json file is scripts block, because it allows this app to run locally.

"scripts": {
    "start": "sls offline start --port 3004 --stage dev --basePath / --prefix dev --location .webpack/service --offline",
  }
Enter fullscreen mode Exit fullscreen mode

4. Write function and template

Our handler (or function) is located in functions folder. All magic happens here. Basically, this handler extracts path parameter from an event, get pug template, compile it, launch headless Chromium browser and make pdf from generated page.

To enable this project to work locally we need to get executable path for Chromium.

const executablePath = process.env.IS_OFFLINE ? null : await chromium.executablePath;
Enter fullscreen mode Exit fullscreen mode

After that we can actually start building our handler. First we will check for path parameter named typeId. Path parameter is declared in serverless.yml file in functions block (functions - generate-pdf - events - path: pdf/generate/{typeId}).

One side note before the code. This is simplified version of our final solution and I am showing it with GET method. In real world we use POST and we send data in body. Using GET here makes it possible to try this directly from browser.

Here is handler:

import { APIGatewayEvent } from "aws-lambda";
import middy from "@middy/core";
import doNotWaitForEmptyEventLoop from "@middy/do-not-wait-for-empty-event-loop";
import 'source-map-support/register';
import chromium from "chrome-aws-lambda";

const handler = async (event: APIGatewayEvent) => {

    const executablePath = process.env.IS_OFFLINE ? null : await chromium.executablePath;
    const typeId = event.pathParameters ? event.pathParameters.typeId : "";
    const template = require("../template/pdfTemplate.pug");
    const htmlContent = template({typeId});

    let browser = null;

    try {
        browser = await chromium.puppeteer.launch({
            headless: true,
            args: chromium.args,
            defaultViewport: chromium.defaultViewport,
            executablePath
        });

        const page = await browser.newPage();

        await page.setContent(htmlContent);

        const pdfStream = await page.pdf({
            format: "A4",
            printBackground: true,
            margin: { top: "1.5cm", right: "1.5cm", bottom: "1.5cm", left: "1.5cm" }
        });

        const response = {
            statusCode: 200,
            isBase64Encoded: true,
            headers: {
                "Content-Type": "application/pdf",
            },
            body: pdfStream.toString("base64")
        };

        await browser.close();

        return response;
    } catch (error) {
        console.log(error);
        return {
            statusCode: 500,
            body: error
        }
    }
};

export const generate = middy(handler).use(doNotWaitForEmptyEventLoop());
Enter fullscreen mode Exit fullscreen mode

Three rows are important in handler:

    const template = require("../template/pdfTemplate.pug");
    const htmlContent = template({typeId});
    /*
    ... And later
    */
    await page.setContent(htmlContent);
Enter fullscreen mode Exit fullscreen mode

Other parts of handler are pretty straight forward. Launch browser, set page, make pdf, send Base64 encoded response. One more important thing is that when making pdf with puppeteer, a Promise which resolves with PDF buffer is returned. Because of that, conversion to string is needed and it needs to be base64 encoded.
This is achieved by putting:

body: pdfStream.toString("base64")
Enter fullscreen mode Exit fullscreen mode

in response.

Pug template is very simple one.

doctype html
html(lang='en')
    head
        meta(charset='UTF-8')
        title Our PDF Generator POC
        style.
            body {
                font-family: Helvetica;
            }
            h1 {
                font-size: 36px;
            }
            h3 {
                font-size: 16px;
            }
            .testClass {
                color: red;
                font-weight: bold;
            }
        body
            h1 PDF generator generated this file
            h2 This PDF is generated from HTML-template
            p Some text to print in PDF
            - var type = typeId
            #type
            if type
                p Request had an id = #{typeId}
            else
                p Request had an empty id.

            p(class='testClass') This line has some styles.
Enter fullscreen mode Exit fullscreen mode

5. Configure Webpack

The biggest problem I had with this project was how to make the template work. Since this project worked locally all the time, chasing bug was... well let's say it took a while. First part of the problem was that Webpack hadn't included template in bundle when deploying to AWS (locally everything worked perfectly). Second part was that when we told Webpack to bundle template, path to it on AWS was different from path used locally. I don't know if serverless framework had anything to do with it or not (I haven't debug it so precisely). I just decided to use copy-webpack-plugin together with pug-loader.

const webpack = require("webpack");
const path = require("path");
const slsw = require("serverless-webpack");
const nodeExternals = require("webpack-node-externals");
const CopyWebpackPlugin = require("copy-webpack-plugin");

// all files with a `.ts` or `.tsx` extension will be handled by `ts-loader`
const ts = {
  test: /\.(tsx?)$/,
  loader: "ts-loader",
  exclude: [
    [
      path.resolve(__dirname, "node_modules"),
      path.resolve(__dirname, ".serverless"),
      path.resolve(__dirname, ".webpack")
    ]
  ],
  options: {
    transpileOnly: true,
    experimentalWatchApi: true
  }
};

// all files with a `.pug` extension will be handled by `pug-loader`
const pug = {
  test: /\.pug$/,
  use: ["pug-loader"]
};

// Webpack configs
const config = {
  context: __dirname,
  mode: slsw.lib.webpack.isLocal ? "development" : "production",
  entry: slsw.lib.entries,
  devtool: slsw.lib.webpack.isLocal
    ? "cheap-module-eval-source-map"
    : "source-map",
  resolve: {
    extensions: [".mjs", ".json", ".ts", ".pug"],
    symlinks: false,
    cacheWithContext: false
  },
  output: {
    libraryTarget: "commonjs",
    path: path.join(__dirname, ".webpack"),
    filename: "[name].js"
  },
  target: "node",
  externals: [nodeExternals()],
  module: {
    rules: [ts, pug]
  },
  plugins: [
    new webpack.EnvironmentPlugin({
      NODE_ENV: "development"
    }),
    new CopyWebpackPlugin([
      {
        from: "./template",
        to: path.join(__dirname, ".webpack/service/template")
      }
    ])
  ]
};

module.exports = config;
Enter fullscreen mode Exit fullscreen mode

So, that's it.

I have created a repository on Github for everyone who wants to take a look more closely.

Discussion (2)

pic
Editor guide
Collapse
goodwin64 profile image
Max Donchenko

Thanks for the article! Useful!

1

difference in code size was 984,9 kB (TS project) vs. 124,8 MB (JS project) for the same code

Why so?

2

Sorry if I'm wrong but you haven't mentioned how you managed to deploy it to AWS. Did you put everything in a bucket? If yes, how bucket and lambda are synchronized?

Collapse
zeka profile image
Zeljko Krsic Author

Thank you for your comment and sorry that you waited for my response this long, I simply haven't noticed till now.

  1. I am not sure why the difference in bundle is so big. I haven't investigate it as I was short on time.
  2. I used serverless framework to deploy package to AWS. Serverless put it into bucket and update all parts used in project. In this case, beside lambda function, an API Gateway and layers are implemented.