DEV Community

Sam
Sam

Posted on • Updated on

Crude Python tree shaking for squeezing into AWS Lambda package size limits

I have been working on a service that manages a number of Machine Learning workloads that is deployed to AWS Lambda. One of the issues I encountered deploying this service is our dependency on a number of popular ML related Python modules such as pandas and scikit-learn and exceeding the maximum size thresholds imposed by AWS. AWS enforces a 250 MB hard limit on Lambda packages and scikit-learn alone is 100 MB unzipped. Deploying yields:

ServerlessError: An error occurred: Resource handler returned message: "Unzipped size must be smaller than 262144000 bytes (Service: Lambda, Status Code: 400)"

There doesn't seem to be many options for slimming down dependencies, I was hoping to find subtree splits of the submodules that were relevant to our project, but came up short. I had already configured the serverless-python-requirements plugin to slim down packages:

custom:
  pythonRequirements:
    slim: true
    strip: false
Enter fullscreen mode Exit fullscreen mode

Note: the strip: false option was required to prevent an error during execution which manifested as:

Runtime.ImportModuleError: Unable to import module 'handler': /var/task/scipy/linalg/_fblas.cpython-38-x86_64-linux-gnu.so: ELF load command address/offset not properly aligned

One option to reduce the footprint of dependencies is to implement some form of tree shaking, to remove chunks of code we're not actually using, the tricky part would be to identify which parts those are. In our case, we had already 100% test coverage, so we were able to use this coverage and run a report against our dependency folder (instead of our src folder) to find out which parts of our dependencies we were actually using:

poetry run pytest --cov-report=html\
 --cov=/path/to/site-packages
Enter fullscreen mode Exit fullscreen mode

Running this report indicates about %6 "coverage" of our dependencies:

Screen Shot 2021-06-23 at 7.26.12 pm

From here, it was a simple matter of taking a snapshot of files with 0% coverage and removing them from our artefact. To accomplish this, I found the serverless-scriptable-plugin was required to remove dependencies added by the serverless-python-requirements plugin.

serverless.yml:

plugins:
  - serverless-python-requirements
  - serverless-scriptable-plugin
...
custom:
  pythonRequirements:
    slim: true
    strip: false
  scriptable:
    hooks:
      after:package:createDeploymentArtifacts:
        - ./shake.sh
Enter fullscreen mode Exit fullscreen mode

shake.sh

#/bin/sh
filelist="_distutils_hack/__init__.py
_distutils_hack/override.py
...
wheel/util.py
wheel/vendored/packaging/_typing.py
wheel/vendored/packaging/tags.py
wheel/wheelfile.py"

for file in $filelist
do
  zip --delete ./.serverless/data-science-api.zip $file
done
Enter fullscreen mode Exit fullscreen mode

In the end 28,000 unnecessary files were removed and the artefact was reduced from 307 MB to 236 MB, unblocking deployment.

Top comments (0)