Christoph Grabo

Posted on Mar 16, 2021 • Originally published at markentier.tech on Mar 16, 2021

Rust/Wasm on AWS Lambda@Edge

#rust #webassembly #lambda #aws

tl;dr: Check out the repo and give it a spin for yourself.

I sat down yet again and took some of my learnings into a repository, so we have a starting point.

As of today (January 2021) AWS only offers two languages for Lambda@Edge: Python and Node.js. Both are available in relatively decent versions, too.

Since I have never bothered with Python too much in my life, but have my fair share of experience with JavaScript (JS) and node, I stuck to the latter as the trampoline environment for the setup. Also I know that it comes with WebAssembly support out of the box, which wouldn't be the case for Python anyway.

So what is it gonna be anyway?

As you've maybe guessed the project will be some Rust, which gets compiled down to WebAssembly. This artefact then can be loaded and executed in the node environment.

A word about performance: I have not evaluated a "JS vs Wasm" comparison, and it's also not part of this exercise. There have been people and article in the past vouching for one side or another, all with their own benchmarks. So I won't bother you with that and advise you to take your own measurements.

WebAssembly will not beat all JavaScript code, especially very fine-tuned one. V8 (the underlying JavaScript engine for both Chrome and Node.js) is a very performant beast and comes with just in time (optimizing) compilation for further boosts.

The Rust code in Wasm clothing can give you probably certain garantuees you miss from JavaScript, but again you have to evaluate if the benefits are worth the effort.

Potentially you might also consider to switch to Python as your runtime instead. At least that language should have real integers as far as I know. 😉

No doubt you can build and deliver very fast and also safe programs with Rust/WebAssembly. Especially if you need some specific algorithms and computations where JS/node might not be the best and you would resort to some (probably C-based) native libraries anyway.

There are only a few issues with that:

You have not full control of the execution environment of your edge functions. Sure, you can introspect with a test function what you're dealing with, but how sure will you really be, that the environment on CloudFront does provide the exact same system and system dependencies as your local development environment (or the non-edge Lambda environment for that matter)? AWS has a track record of not providing you with reproducible local environments. In fact, it looks like they get away with it even further, since the announcement for containerization support for regular AWS Lambda. People, who know me, also know that I'm not a big fan of big docker images, but I'm afraid that's what we will see now happening there. I hope AWS promotes good containerization guidelines to prevent that waste-of-storage mess. Furthermore I really don't want to see docker on the edge for that reason. One can just hope, right?
You work in a very constrained environment. Check the current limits for Lambda@Edge: the zipped code can use up to 50MB on the origin side and only 1MB max if it shall be deployed for the viewer side. Of course, this is usually still plenty of spaces for most use cases, packaging up plain JS results in very small archives. But once you take the first issue into consideration, then this could actually become another problem for you.

The size restriction can be mitigated for JS-only code pretty easily by bundling the code with WebPack, Parcel, or Rollup. General advise is anyway to never deploy the unoptimized package especially when you want to push it to the edge. The node_modules folder grows very big, can still have quite some bloat even after an npm prune --production, because it only looks at packages, but not the content of them. Yep, my minimalism brain kicked in again.

The system dependency problem can only be solved by using solely pure JavaScript libraries and packages. That might work for a while, but eventually some use case might demand a non-JS solution (either a native library or some executable).

For example let's say you want to build an image transformation function and want to use sharp, a very well-known package in the JS ecosystem, then you already end up with around 37 MiB of data in your node_modules folder alone. Zipped up it's still around 13 MiB. That might be enough for you to run it as a trigger on the origin side of your CloudFront distribution; it's just about showing you how quickly a node project can grow.

If size and dependency management are not an issue

Maybe you love Rust (or any frontend language capable of being compiled to WebAssembly).
Maybe you love WebAssembly.
Maybe you do not have good JavaScript/Node.js expertise in-house.
Maybe you want to build your product with better safety.
Maybe it should be more robust, too.
Maybe you want to show AWS, that we need more than just Python and Node.js on the edge.
Maybe you have some other valid reason to escape that limiting cage.

Whatever your reasons are, I hear you.

AWS is improving on one side, but also loosing it on another. When it comes to CDNs (Content Delivery Networks) and Edge Computing, the competition is now sprinting ahead of AWS.

I cannot say a lot about Fastly's offering, it's mostly behind some beta doors, and mere mortals like myself are not allowed to peek. They have their fastlylabs, but that's for experimentation, not the final offering. So I don't even bother to check it out.

I can tell a bit more about Cloudflare though, because their edge computing offering is available and affordable to small businesses and individuals (even free in most cases). Check out Workers, really do! I have already played around with Workers and KV storage, it's a quite pleasant experience. I might write about a setup for them in the future as well.

Let's get started

GitHub repository to follow along:

https://github.com/asaaki/rust-wasm-on-lambda-edge

$ tree -L 1 -a -I .git # output truncated for brevity

.
├── .github - GitHub Actions workflow and dependabot
├── .gitignore
├── Makefile - very convenient make targets
├── README.md
├── fixtures - simple test fixtures and script
├── node - Node.js wrapper
├── rust - Big Business Here!
└── <and some more …>

Ingredients

Makefile for project level management
TypeScript (TS) for the Node.js part
Type definitions for AWS Lambda
Rollup as the bundler
Rust for the, well, Rust part
wasm-pack for WebAssembly building
zip to package up the function for upload
Example fixtures and code to have a very quick and dirty request test
GitHub Actions workflow for continuous integration (CI) purposes

On your machine you need to install Rust, node, wasm-pack, and zip, if not present yet. The workflow for GitHub Actions has that already sorted out for you.

This article won't give you steps to get your local development environment set up, please use a search engine of your choice and look up how to do it.

Node.js wrapper

I adopted a rollup based approach, since it's quite easy to get configured and also something we use at work. I always found webpack a little bit too cumbersome, and parcel is just yet another new kid on the block. I'm pretty sure you can adjust the project to your needs. All we need here is: compile TS to JS and bundle up everything into a single JS file. In the past I found the WebAssembly dependency management very tricky, in the end I used a plain "move the .wasm file into the final bundle" approach, which just works fine, because I did not want to inline the WebAssembly code (as most plugins try). Maybe you have a smarter solution for that, please open a pull-request in the repo. Just keep in mind: wasm-bindgen creates already a pretty decent module loader, so there is no need to work around that, but I fail to get any of these bundlers to move the wasm files along with it into the bundle output directory.

I use TypeScript here, because it gives you some nice support during development. Also the aws-lambda type definitions were useful to create the Rust structs and adjust the serialization. (AWS is actually very strict about the JSON you pass around, "something": null for absent data does not work, either it is a completely required field with a strict type like a string, or it should not be in the JSON at all).

In general for any bigger frontend or node backend project I would recommend to use TS nowadays. While not every of your dependencies might come with type definitions, at least within your own codebase you can enforce strict rules and type checks.

To make the node wrapper as slim as possible, we pass the event and context data directly into the WebAssembly and return whatever it returns.

Btw if you do return a specific struct instead of a generic JsValue, then TS checks will also kick in and use the auto-generated type definitions from the wasm-bindgen process.

For a quick baseline test and project I did not go down that road yet, as it would require to replicate all the TS specific stuff in the Rust code (wasm-bindgen cannot do everything fully automated yet). This is a great opportunity to create a utility crate for that, basically like rs2ts but in reverse direction. I wish aws-lambda-events already had those CloudFront events in it, but sadly they don't.

Yet also be aware of certain type limitations, read on to learn more about them.

Rust business logic

The Rust code is also nothing special so far.

// src/lib.rs

#[global_allocator]
static ALLOC: wee_alloc::WeeAlloc = wee_alloc::WeeAlloc::INIT;

mod types;

use std::panic;
use types::cloudfront as cf;
use wasm_bindgen::{intern, prelude::*};
use web_sys::console;

type JsValueResult = Result<JsValue, JsValue>;

#[wasm_bindgen(start, final)]
pub fn start() {
    panic::set_hook(Box::new(console_error_panic_hook::hook));
    console::log_1(&intern("(wasm module start)").into());
}

#[wasm_bindgen(final)]
pub async fn handler(event: JsValue, _context: JsValue) -> JsValueResult {
    console::log_1(&intern("(wasm handler request call)").into());
    let request = cf::Event::request_from_event(event)?;

    // TODO: Fancy biz logic here ...

    request.to_js()
}

Note: The displayed code might not be up-to-date with the repository version.

There is one function (start) which is triggered when the Wasm module gets loaded. You can use it to set up some internal state if needed. We only used it here for configuring the panic handler; whenever an unrecoverable error happens, it gets logged via console.error, helps immensly with debugging. And as we do console logging anyway, there shouldn't be any significant overhead for that part. The compilation output will probably a bit bigger because it needs to store more information for the better panic output.

The other—probably way more interesting—function is handler, which takes the inputs from the JS side, does … a lot of nothing, and returns a request JSON blob for CloudFront to deal with.

Currently the machinery reads the arbitrary JsValue and tries to deserialize it into a struct, so we can deal with it in code. This is definitely not the most efficient way of doing it, but the conversions in and out really avoid some current existing pain points.

For example wasm-bindgen has not a great story around Rust enums, for now only very simple C-style enums are allowed. Meaning: for our CloudFront (CF) event data, which can be more strictly typed into either a CF request or response event, this does not play well with Rust's richer enums, as we cannot convince wasm-bindgen to use them. There is an open issue around this topic, but it was created just recently and thus no work has been done yet. Similarly Rust's Vec is also not fully supported yet (see issue 111), which might be the even bigger issue for some of us.

Workarounds can be a lot of Options and serialization skips, as I do internally anyway.

Some transformation overhead can be addressed by using serde-wasm-bindgen, but in my example repo I'll use it only for the input side (deserialization). On serialization a collection like HashMap or BTreeMap gets turned into an ES2015 Map, which is unfortunated as well, because they cannot be JSON stringified.

As you can see, currently there are trade-offs to be made in all corners, but that shouldn't stop us to explore further.

In the current state of the project I have provided pretty strict structs and enum for the CloudFront event data, it even surpasses now the TypeScript interfaces, which makes my point from the previous section pretty obsolete now. I still wish it was easier to autogenerate Rust types from TS definitions. The only good thing about CloudFront related data is, that it won't change that much … if at all. Some APIs in AWS have been stable for years now, so a "write once, use forever" approach might be sufficient.

Performance

Read about my poor man's performance testing of AWS Lambda@Egde functions on my blog. The tl;dr is that it's equally fast compared to a regular node.js setup.

Conclusion

This is all great news: you can run WebAssembly on AWS Lambda@Edge without a noticable performance penalty. Now write your Rust code and run it on the edge.

Of course I do hope that in the future this will become more native. There's a lot of development happening in the WebAssembly space.

DEV Community