JoLo

Posted on Mar 31, 2020

My pain with Serverless and AWS Lambda

#aws #lambda #python #serverless

Just recently, I got to work with Serverless on AWS Lambda. It's a great technology, and I love the idea not to manage and provision underlying servers. I do much programming in Python, and luckily AWS Lambda comes with a Python Runtime. The Serverless Framework is an excellent way to start; that's what I thought...
Here is my story with Serverless Development in AWS Lambda and Python and some of my pain.

Python and Serverless Framework

Probably, you have heard the term Serverless which is a technology where you don't care about managing servers and its underlying infrastructure. Still, they are there, but you care about server less. I recently had a task where I needed to deploy a Deep Learning function with Tensorflow and Keras.
Our Data Scientist passed me his predict - function, and I needed to productionize it. So good so far.
I thought of using the Serverless Framework to deploy this function on AWS.
During the set up by using virtualenv, I encountered my first problem. After googling, I found out that the library is supporting currently Python3.7 and I had Python3.8 installed. Hmm... okay, no problem, I will use Docker instead.

cd /path/to/my/project
docker run -it --name tensorflow -v $(pwd):/root/home imagehash bash

The Docker-Image comes with pip out of the box.

# Installing Tensorflow
pip install tensorflow==1.15

# Installing Keras
pip install Keras==2.2.5

Notice that I don't run virtualenv here because I am now in a new fresh Docker container with zero packages. So, I removed it. I also know that there is a solution for using Tensorflow with Python 3.8 but put that in the requirements.txt 😉 We should stick to a stable version.

Deploying

Now was the time to deploy it on AWS. That means we need to create a handler. Luckily, Serverless Framework comes with a Python template

# Python
sls create --template aws-python3 --name name-of-service --path path/to/store
# Note: sls is short for serverless, both are possible

This command generates a handler.py for your function and a serverless.yml for the serverless setup. After removing all the comments and changing the Python runtime, it looks as follows

service: aws-python

    provider:
      name: aws
      runtime: python3.8

    functions:
      hello:                    # 1st hello function name for AWS
        handler: handler.hello  # 1st handler tells sls where to look for a handler         
                                # 2nd handler is the Python file
                                # 2nd hello is the function inside the Python file

I still have some trouble understanding the functions part and its keys and how to call the function. I hope I got it right as above described. When I started the Serverless Framework, I had something like

functions:
      handler:
        handler: handler.handler # Brainf*ck 🤪

Anyhow, now I am ready to deploy by running the command

# Requirement: AWS Cli must be installed
sls deploy

and execute it with its logging

sls invoke -f handler --log
# Output
[ERROR] Runtime.ImportModuleError: Unable to import module 'tensorflow'

Hmm, it seems like AWS is not running pip install -r requirements.txt. Which is weird because that file is also deployed. That's annoying, but there is a plugin for that.

sls plugin install -n serverless-python-requirements

It creates package.json and a node_modules because the Serverless Framework is written in Javascript and Node.
Okay, now we should be ready to go 💪

sls deploy
# Error
An error occurred: UploadLambdaFunction - Unzipped size must be smaller than 262144000 bytes (Service: AWSLambdaInternal; Status Code: 400; Error Code: InvalidParameterValueException; Request ID: c3b94dc7-6a06-11e9-8823-bb373647997a).

What??😱
262144000 bytes, yep that's the error you get. In theory, you could upload 0 bytes (I don't know how that is possible 😅), I guess that's why the message is in bytes. In other terms, the function should not be more than 262 MB. Well, my handler.py is 4 kB and the project in total with Serverless Framework and its plugin 5 MB.

What causes this number?

It must be the dependencies causes by the requirements.txt. That plugin must be doing something, and it installs the whole Tensorflow and Keras library! And guess what Tensorflow is already ~500 MB large 😱.

Alright, the Serverless-Python-Requirements can deal with Lambda's size limitations.

custom:
  pythonRequirements:
    zip: true

And before the handler.py I had to paste following code snippet in order to unzip the requirements:

try:
  import unzip_requirements
except ImportError:
  pass

It creates a unzip_requirements.py also. However, the Lambda gets even bigger...

custom:
  pythonRequirements:
    zip: true
    slim: true # This had been added.

And guess what, it still fails🙈
It is still too big...
After googling, I found an excellent comprehensive blog post, and I applied the change (without dockerizePip though)

custom:
  pythonRequirements:
    dockerizePip: non-linux
    zip: true
    slim: true
    noDeploy:
      - boto3
      - botocore
      - docutils
      - jmespath
      - pip
      - python-dateutil
      - s3transfer
      - setuptools
      - six
      - tensorboard
package:
  exclude:
    - node_modules/**
    - model/**
    - .vscode/**

Et voilá, we have a zip file suddenly with 150 MB. Whenever we install a package from pip, we also install its dependencies. And often, we install unconsciously heavy packages such as NumPy or SciPy. With the noDeploy, we omit specific packages from the deployment. The list above is a standard list which already built into Lambda, as well as Tensorboard.

Running the Function

To invoke the function, we need to run

sls invoke --function name-of-function

And of course it fails 😭

{
  "errorMessage": "[Errno 28] No space left on device",
  "errorType": "OSError",
    ..
}

Locally, I had run it on my Mac with 8-Core Intel Core i9 and 16 GB RAM. On AWS, I had used the maximum Memory of 3008 MB.
Well, Lambda was not able to run my Data Science function.
In the end, I put that into a Docker Container and shipped that one function to AWS.

Conclusion

AWS has evolved the computing process and application development with Lambdas. Serverless was born. However, it's not ideal for every use case. You borrow some computational power where you don't have any influence on capacity or resources. That is, in many cases, sufficient, but when it comes to processing an image or using a vast library, then you might overthink your usage of Serverless.

I don't say I won't use it. In fact, I have many successful stories to tell, and I love Serverless 💓 It is just not as easy as everyone praised. You still need to know what's behind the handler, the size of the Lambda, and the computational power you need. You need to configure on many places such as YAML- file(s), AWS region, because not every region is the same, and the dependencies of your dependencies. Maybe it really depends on the language. I had less pain with Node than with Python. Regardless, n the beginning, it is a lot of try and error, which is also time-consuming and can be frustrating.

Nevertheless, Serverless is my first choice when developing a service, and I think after many ups and downs, you will have a better feeling when to use it and when not.

Top comments (1)

Mark Tse • Aug 26 '20

Curious: have you tried redirecting any outputs from your code into /tmp? Lambda offers up to 512 MB of storage on /tmp: docs.aws.amazon.com/lambda/latest/...

Mounting an EFS might also be an option, although it would be much cheaper to stream directly to S3.