Among all the new features and services that AWS announced during the re:Invent 2020, my favorites were definitely the AWS Lambda updates. And there were many! For example, your code execution is no longer rounded up to the nearest 100ms of duration for billing --- you are now billed on a per millisecond. On top of that, AWS increased the Lambda's memory capacity to 10 GB, and correspondingly the CPU capacity up to 6 vCPUs . But today, I want to dig deeper into something even more exciting for me. Namely, from now on, AWS Lambda doesn't require packaging your code and dependencies to a zip file. Instead, you can now do it with a Docker container image that can be up to 10 GB in size.
Personally, I consider this a game-changer for many serverless use cases. And here's why.
Until recently, the only way of creating a serverless function on AWS was to select your specific language and runtime (ex. Python 3.8), then making sure that you install all your custom dependencies inside of your project directory (or adding site-packages from a Python's virtual environment to your zip package) and finally, compressing all that into a zip package. If your zip file is bigger than 50MB, you would also have to upload the code to S3 and reference it in your function definition. All that is doable. Many developers (me including) used to create their own methods to make it easier, such as using Lambda layers, site-packages from a virtual environment, and building shell scripts for deployment.
On the surface, it seems like not much changes --- instead of zipping your code, you now define your dependencies inside a Dockerfile. But there is more to it, as defining your runtime environment in a container image gives you much more control over your environment compared to what you get with predefined runtimes and zipping dependencies.
A zip file with a predefined runtime environment has its limits: what if you would like to use a specific Python environment that has been reviewed by your company's security team? Or what if you need some additional OS-level package? With the container image support, you can do that since a Docker container has no restrictions in the base image and packages you choose to install. This makes "serverless" accessible to a wider audience, and the development of FaaS (Function as a Service) becomes much easier.
In theory, it's even possible to create custom images for other programming languages, although this requires implementing a custom runtime and is more involved.
The interface of AWS Lambda now looks as follows:
**Note: at the time of writing, only Linux containers are supported.
Let's build a simple ETL example. Here is a project structure that we will use:
My requirements.txt contains only: pandas==1.1.0.
The actual code, demonstrated below, is just a simple ETL example counting exam scores of Harry Potter's characters, but you can use it as a scaffold for your use case:
Now to the fun part: the Dockerfile that will define all our code dependencies so that we don't need to zip our code!💪🏻
Usually, your base image for Python 3.8 would start with FROM python:3.8 in order to use the official Python image from the Dockerhub. However, to make it usable with AWS Lambda, your base image must include the Lambda Runtime API. To make it easier for us, AWS prepared many base images that we can use, such as the one defined in line 3 in the Dockerfile presented above. You can find all AWS Lambda images in the public ECR repository as well as in the Dockerhub registry:
The best part of developing your Lambda functions with a container image is the dev/prod environment parity. You can easily test your code locally with Docker before deploying your code to AWS. Your local containerized environment is identical to the one you will be using later in production. This is possible due to a web-server environment called Lambda Runtime Interface Emulator (RIE) (you can find out more about it here), which has been open-sourced by AWS. This emulator is already baked into all Lambda images (amazon/aws-lambda-*) that you can find on Dockerhub or in the ECR public image repository.
Run the following commands from the project directory that contains Dockerfile:
Then, in a new terminal window, run:
Here is what I'm getting as output:
Local execution looks good. Let's deploy it to AWS.
We can now run the following commands to create an ECR repository and push our container image to ECR:
Now that our image is deployed, we can use it in our Lambda function:
Note that we didn't have to select the runtime environment since it's all already defined in our container image. We tested the function from the AWS management console and saw that we got the same result as when tested locally.
By now, you may be convinced that running containerized workloads with AWS Lambda has a myriad of advantages, and you may want to use it now much more extensively. However, I encourage you to think ahead about observability and approach the serverless workloads with an architect's foresight.
Imagine that you migrated several data pipelines from a container orchestration solution to AWS Lambda. How do you know which of those pipelines failed and why? Sure, AWS offers native support for logging and alerting via Amazon CloudWatch. Still, to be completely honest, AWS services for monitoring and observability require some extra work to set up proper alerting, configure log groups, and set up everything to ensure tracing with X-Ray. Then, we also need to decide on metrics to track and build CloudWatch dashboards to visualize this data.
You can considerably improve the developer experience by using tools such as Dashbird, which allows you to easily add observability to your existing serverless workloads without any changes to your code or infrastructure. All you need to do is to create an IAM role that will grant Dashbird cross-account permission to communicate with your AWS resources. Once that's configured, you can immediately start enjoying all benefits of the platform, such as automated alert notifications, visualizations of your metrics, and actionable insights based on the AWS Well-Architected Framework to improve performance, save costs, and enhance the security of your cloud resources.
Actionable insights gathered by leveraging Dashbird.
When using a container image rather than a zip package for your serverless function deployments, you'll get the following benefits:
- Support for any programming language you want (as long as you use a base image that implements the Lambda Runtime API),
- Ability to easily work with additional dependencies that can be baked into a container image such as additional Python modules or config files,
- Flexibility & independence of any platform --- you can easily move the same jobs to a K8s cluster or any platform supporting containerized applications. In our example, we would have to only change the base image back to python:3.8 and the entry point command to: CMD ["python", "etl.py"] within the Dockerfile.
- You have much more control over your packaged environment --- with a traditional Lambda runtime environment, you use what you got from AWS. In contrast, with a container image, you have many options to customize the environment to your needs. Imagine that you want to use a smaller and more lightweight Python image for performance and cost optimization or an image that has been approved by the security team to meet your companies' specific compliance requirements.
- Your code can run anywhere --- a containerized application minimizes any surprises when moving your code from your local machine to the development, testing, or production environment. Your code can run anywhere without side effects.
- Run event-driven containers --- while container orchestration platforms such as Kubernetes are great, some use cases may be better served from a simple FaaS, for instance, when you want your code to run every time a new file arrived in S3 or when somebody made a request to our API. AWS Lambda is perfect for such use cases.
I'm quite happy about all the new AWS Lambda features. As a huge proponent of containerized applications, I prefer that option over zipping the code for a serverless deployment. These days, developing self-contained microservices has become easier than ever before due to the existence of so many platforms and services to run containers at scale. And if you want to ensure observability and enterprise-grade monitoring of your serverless containers, Dashbird is a great option to consider: https://dashbird.io/.