“Evil cannot create anything new, they can only corrupt and ruin what good forces have invented or made.” - JRR Tolkien.
Preface
Would it be great to have the functionality that would enable you to generate PDF files using HTML && CSS capabilities without the need to rely on overly complex drivers that are dependent on a whole bunch of C libraries?
While also supporting all the latest features of HTML5 && CSS3?
Well, we have great news. There is a framework called Puppeteer that uses relatively new Chrome feature and makes it accessible though a NodeJS based API.
Essentially what Puppeteer does, is: Launches a Chromium browser instance in a headless mode ( not actually opening it ), and allows us to manipulate the browser via set of API command to parse website, retrieve images and generate PDF as if you were actually opening an HTML file in the latest browser version, etc..
While we can create a running Docker Puppeteer instance and deploy that on ECS or Heroku. The creation of stable && optimized image can be quite challenging...
Having a running instance in AWS Lambda IHMO in contrast would be much simpler in terms of development speed, debug and monitoring. Besides, serverless, is a nice concept for POC ( you pay for what you use )
Repo -> End Result
You can see the complete working example in this repo
Generate PDF document via Puppeteer running on AWS Lambda
This repo contains a serverless application that takes a HTML template and return a PDF in form of a binary
Diagram
Requirements
How to Run
-
Clone this repo
git clone https://github.com/zahaar/generate-pdf-lambda
-
Import cUrl to Insomnia ( Postman is not recommended, as it can't visualize Pdf ).
-
Run
make api-local
to have local API GW running. -
Send
cUrl
request via Insomnia.
You can also invoke Lambda bypassing API GW, by supplying an example event in file, and running
make invokation-local
. The response would be a base64 encoded PDF binary.
How to Deploy
A configured AWS CLI V2 is a must -> AWS Console Account && API Keys
-
make deploy
-
Fetch AWS SAM deploy output URL
Value
, and change the Url in Insomnia fromlocalhost
to that value execution result in…
Requirements and Prerequisites
1. SetUp local AWS SAM Template with Chrome Lambda Layer
In this step the local SAM execution setUp will be complete. Once this is done, we will have a strong reference point.
The end version of this step can be fetched from
1_local-setup
branch
We can create a basic SAM template by running sam init
or reference a guide
but our end goal should be a sophisticated structure like this
├── Makefile
├── VERSION -- for VERSION tracking, helpful for CI
├── envs.json -- to sep envs for local execution ( if necessary )
├── events
│ └── api-gw-event.json -- an example API GW event for local execution
├── src
│ └── app.js -- main source code file
└── template.yaml -- AWS SAM configuration template
app.js
contains simple code that will return the same event.body
that it receives from example event.
...
...
var response = {
statusCode: 200,
body: event.body,
}
return response
...
while template.yml
has a resource configuration for API GW Service
...
...
ApiGatewayApi:
Type: AWS::Serverless::Api
Properties:
StageName: Staging
BinaryMediaTypes:
- application~1pdf // Note the support for binary pdf media Type
...
and the Lambda. As per context of our goal, it's called PdfFunction
Take note of the Layer
being used in this config. By setting chrome-aws-lambda
, we have essentially ruled out the need to set package.json
dependencies for puppeteer
and chrome
on Docker
image thar AWS is using on EC2 for Lambdas, as this step can be quite challenging.
...
...
PdfFunction:
Type: AWS::Serverless::Function
Description: Invoked by EventBridge scheduled rule
Properties:
CodeUri: src/
Handler: app.handler
Runtime: nodejs12.x
Timeout: 15
MemorySize: 3008
Layers:
- !Sub 'arn:aws:lambda:${AWS::Region}:764866452798:layer:chrome-aws-lambda:22'
Environment:
Variables:
EXAMPLE_ENV: 'CHANGE_THIS'
Events:
ApiEvent:
Type: Api
Properties:
Path: /pdf
Method: post
RestApiId:
Ref: ApiGatewayApi
...
To test that all requirements are met, let's run a local event.
make invokation-local
The output should be essentially the same as the execution logs in CloudWatch
...
...
Mounting /Users/wparker/Dev/scheduled-website-screenshot-app/.aws-sam/build/PdfFunction as /var/task:ro,delegated inside runtime container
START RequestId: e4d7743d-5be2-4735-84c8-9d5160d9a750 Version: $LATEST
...
2. Configure Puppeteer in Lambda; Supply Template HTML
Next step is to program app.js
to start puppeteer
, consume HTML
from an API GW
event and return a base64
encoded response that would be decoded
on Response
by API GW
.
The end version of this step can be fetched from
2_generate-pdf
branch
We need to change the Lambda handler code to something like this. File
( File is too long to displayed here )
Key takeaways are:
- Browser launch args parameters in this example are set specifically for AWS Lambda compatibility.
...
...
browser = await chromium.puppeteer.launch({
args: chromium.args,
defaultViewport: chromium.defaultViewport,
executablePath: await chromium.executablePath,
headless: chromium.headless,
ignoreHTTPSErrors: true,
})
- The return format goal was set to mimic A4 document.
...
...
await page.setViewport({
width: 1080,
height: 1600,
deviceScaleFactor: 1,
isLandscape: true,
})
pdf = await page.pdf({
format: 'a4',
margin: {
top: '0px',
right: '0px',
bottom: '0px',
left: '0px',
},
})
...
- The response headers are set for
pdf
file transfer.isBase64Encoded
flag is set totrue
to informAPI GW
that it needs to decode the file.
...
...
var response = {
statusCode: 200,
headers: {
'Access-Control-Allow-Origin': '*',
'Access-Control-Allow-Methods': 'GET, POST',
'Content-type': 'application/pdf',
'Content-Disposition': 'attachment; filename="foo.pdf"',
},
isBase64Encoded: true,
body: pdf.toString('base64'),
}
...
To test this code, an HTML
template is needed. We will use this open-source one for demonstration.
The document is being sent as body with 'Content-Type: text/html'
Please note
'Accept: application/pdf'
, this is important.
The end result of cUrl
request is in this file
To test our result let's start local SAM in local start-api
mode. ( akin to a server, contrary to one time invokation)
make api-local
Import cUrl
into Insomnia
.
Result
3. Deploy Lambda + API GW via SAM
Refers to step in README.md
on main
branch
Tips
1. Insomnia vs Postman
Instead of playing tricks with Postman PDF Visualization. I highly recommend switching to Insomnia
Insomnia Visualized Response | Postman Visualization Response |
---|---|
2. Error: Error building docker image: pull access denied for
Works fine here. You shouldn't need credentials for Public ECR (you can use auth for specific cases) but if you just want to consume it, remove the existing credentials
docker logout public.ecr.aws
and then try the build again.
That said, if you still want to make use of the…
.
Top comments (0)