Prerequisites
Before we dive into the details, make sure you have the following in place:
An AWS account with access to AWS Lambda and other necessary services.
A Node.js project where you can deploy your Lambda function.
Familiarity with Puppeteer and basic AWS Lambda concepts.
Setting up the Environment
The first step is to set up your development environment. In your Node.js project, you'll need to install the **chrome-aws-lambda **package. This package contains a headless Chromium binary optimized for AWS Lambda environments.
npm install chrome-aws-lambda
Once the package is installed, you can create a Lambda function to run your Puppeteer code.
Creating the AWS Lambda Function
For this example, we'll create a Lambda function that navigates to a webpage and returns its title. Here's a sample Lambda function:
const chromium = require('chrome-aws-lambda');
exports.handler = async (event, context) => {
let result = null;
let browser = null;
try {
browser = await chromium.puppeteer.launch({
args: chromium.args,
defaultViewport: chromium.defaultViewport,
executablePath: await chromium.executablePath,
headless: chromium.headless,
ignoreHTTPSErrors: true,
});
let page = await browser.newPage();
await page.goto(event.url || 'https://example.com');
result = await page.title();
} catch (error) {
console.error(error);
} finally {
if (browser !== null) {
await browser.close();
}
}
return result;
};
Let's break down the key parts of this Lambda function:
We import the chrome-aws-lambda package, which provides an optimized Chromium binary for AWS Lambda.
Inside the Lambda handler, we launch a Puppeteer browser using the chromium.puppeteer.launch method. We pass various options, such as command-line arguments and the path to the Chromium binary.
1- We create a new page, navigate to a URL (you can specify the URL as an event parameter), and retrieve the page title.
2-If any errors occur during the process, we log them to the console.
3- Finally, we return the result, which is the page title in this example.
Deploying the Lambda Function
To deploy the Lambda function, you can use the AWS Management Console, AWS CLI, or a tool like the Serverless Framework. Here, we'll use the AWS CLI as an example:
Ensure you have the AWS CLI configured with the necessary permissions.
Create a deployment package by packaging your Node.js code and its dependencies. In your project directory, run the following command:
zip -r function.zip node_modules/ your-function.js
1-Replace your-function.js with the name of your Lambda function file.
Create the Lambda function using the AWS CLI. Replace with the appropriate IAM role ARN that gives Lambda permissions to access other AWS resources.
aws lambda create-function --function-name YourFunctionName \
--zip-file fileb://function.zip \
--handler your-function.handler \
--runtime nodejs14.x \
--role <ROLE_ARN>
2-Invoke your Lambda function using the AWS CLI or any other method you prefer:
aws lambda invoke --function-name YourFunctionName output.txt
Replace YourFunctionName with the name of your Lambda function. You should see the title of the webpage in the output.txt file.
Conclusion
Running Puppeteer in an AWS Lambda function can be a powerful way to automate browser tasks on a serverless infrastructure. With chrome-aws-lambda, you can use an optimized Chromium binary, reducing the cold start time and improving the overall performance of your Lambda functions. This can be useful for various use cases, including web scraping, automated testing, and generating PDFs from web pages. Just keep in mind that AWS Lambda has some limitations, such as execution time limits and memory restrictions, so it's important to design your functions accordingly.
Top comments (2)
Hi, nice post!
When running puppeteer (chromium) in lambda I found useful utilizing container images. Using stripped version of Chromium in provided runtime means that you trust binaries prepared by some person on Internet, which is a no-go in many organizations.
Even with container images, lambda runtime is constrained. In my specific case I've ended up with step function which would spin off ecs fargate task for running puppeteer.
Cheers!
Hi,
Thank you for sharing your insights on running Puppeteer in Lambda. It's great to hear that you found container images to be a useful approach.
You bring up a valid point about the trustworthiness of binaries, and it's something that many organizations are concerned about. Using container images can certainly help mitigate that concern by providing more control over the environment.