DEV Community

hyper
hyper

Posted on • Originally published at Medium on

Capturing full page screenshots with puppeteer and Architect (arc.codes)

⚡ Lets build a server-less app that browses any site we provide in the url and takes a full page screenshot of the site and returns it to our browser! Sound like fun? ⚡

This turned out to be a little more challenging than I original thought, but I got it done! ☺️

In this tutorial, we will walk through some steps to create a server-less endpoint that takes a url as a query param and uses puppeteer to create a browser. The browser will navigate to the passed in url and take a picture of the full page web view.

Setting up Architect

Architect is a framework for building server-less functions on top of AWS Lambda. Architect provides a great boundary between just writing a function and AWS.

Checkout https://arc.codes/docs/en/guides/get-started/quickstart

npm install --global @architect/architect
Enter fullscreen mode Exit fullscreen mode

Create a new folder called screenshoter

mkdir screenshoter
cd screenshoter
touch app.arc
Enter fullscreen mode Exit fullscreen mode

modify your app.arc file to build an app with a single endpoint

@app
screenshoter

@http
/
  method get
  src src
Enter fullscreen mode Exit fullscreen mode

Save the file, then run:

arc init
Enter fullscreen mode Exit fullscreen mode

This will create a new folder in your project directory with an index.js file in it.

You can run a local sandbox and test out your new server-less function by running the command:

arc sandbox
Enter fullscreen mode Exit fullscreen mode

Point a browser to http://localhost:3333 and you should see the Architect demo page.

Setup NodeJS Project

In your terminal, change into the src directory and run npm init -y this will initialize your src folder as an npm project.

cd src
npm init -y
Enter fullscreen mode Exit fullscreen mode

Each endpoint in Architect is a separate lambda application, so if you have dependencies they need to reside in the same folder as the index.js file. By initializing the folder as an npm project, you create a package.json file which will contain your project manifest. There is more information at arc.codes.

Lets install some dependencies we will need in our project:

Installing puppeteer for lambda

We need to install some special dependencies for puppeteer to use in aws lambda

npm install puppeteer-core
npm install puppeteer-extra
npm install chrome-aws-lambda
npm install puppeteer-extra-plugin-stealth
npm install puppeteer-full-page-screenshot
npm install -D puppeteer
Enter fullscreen mode Exit fullscreen mode

These modules will allow us to create a browser on aws lambda and capture a full page screenshot, the next thing we need is some image tools to convert the image into a base64 string.

Installing Jimp

npm install jimp
Enter fullscreen mode Exit fullscreen mode

Jimp is a NodeJS package that allows you to manipulate images then either write them to disk or buffer.

Creating our handler function

The easiest way to do this is to remove the current index.js and create a new index.js file.

rm index.js
touch index.js
Enter fullscreen mode Exit fullscreen mode

Then lets create our handler function

require('puppeteer-core')
const chromium = require('chrome-aws-lambda')
const { addExtra } = require('puppeteer-extra')
const puppeteer = addExtra(chromium.puppeteer)
const StealthPlugin = require('puppeteer-extra-plugin-stealth')
const Jimp = require('jimp')

puppeteer.use(StealthPlugin())

exports.handler = async function(req) {

}
Enter fullscreen mode Exit fullscreen mode

Get the url query parameter

We need to get the url parameter from the queryStringParameters

...
exports.handler = async function(req) {
  const { url } = req.queryStringParameters
  ...
}
Enter fullscreen mode Exit fullscreen mode

Create the puppeteer browser

...
exports.handler = async function(req) {
  ...

  const browser = await puppeteer.launch({
    args: chromium.args,
    defaultViewport: chromium.defaultViewport,
    executablePath: await chromium.executablePath,
    headless: chromium.headless
  })

  ...

}
Enter fullscreen mode Exit fullscreen mode

Create a new page (Like Browser Tab)

...
exports.handler = async function(req) {
  ...

  const page = await browser.newPage()
  page.setDefaultNavigationTimeout(0) 

  ...

}
Enter fullscreen mode Exit fullscreen mode

We set the timeout to zero which is like setting to infinity.

Go to the url

...
exports.handler = async function(req) {
  ...

  await page.goto(url)

  ...
}
Enter fullscreen mode Exit fullscreen mode

Get the screenshot

...
exports.handler = async function(req) {
  ...

  const img = await fullPageScreenshot(page)

  ...
}
Enter fullscreen mode Exit fullscreen mode

Convert to base64

...
exports.handler = async function(req) {
  ...

  const base64 = (await Jimp.read(img.bitmap).then(
    i => i.getBufferAsync(Jimp.AUTO))).toString('base64')

  ...
}
Enter fullscreen mode Exit fullscreen mode

Close the browser

...
exports.handler = async function(req) {
  ...

  await browser.close()

}
Enter fullscreen mode Exit fullscreen mode

Return a Response Object

...

exports.handler = async function(req) {
  ...

  return {
    statusCode: 200,
    headers: {
      'Content-Type': 'image/png'
    },
    body: base64
  }
}
Enter fullscreen mode Exit fullscreen mode

Run in the sandbox

cd ..
arc sandbox
Enter fullscreen mode Exit fullscreen mode

Deploy to AWS

arc deploy
Enter fullscreen mode Exit fullscreen mode

Debug errors in logs

arc logs src
Enter fullscreen mode Exit fullscreen mode

Summary

This post shows you the power of aws lambda and how easy it is to use tools like architect (arc.codes) to get up and going, even run a browser in the cloud! Also, how to use tools like Jimp to convert an image to base64 for send via a http response. Finally, the power of puppeteer, you can do just about anything you can do in a browser with puppeteer!

Top comments (0)