DEV Community

Cover image for Running Notebooks with arbitrary dependencies in an AWS lambda
Sam
Sam

Posted on • Edited on

Running Notebooks with arbitrary dependencies in an AWS lambda

Goals:

  • Run notebooks files in a lambda.
  • Allow them to install their own dependencies.

In this instance, I've used serverless framework, but the problems solved likely apply to other frameworks. After trying a number of approaches, the following seemed to work within the constraints of lambdas read-only file system:

  • Create a dedicated workspace in /tmp.
  • Copy the notebook and a script to start and execute a virtual environment into the workspace.
  • Fork off to the script and allow it to run to completion.

Starting with the serverless.yml file, note "IPYTHONDIR" must be set to somewhere in /tmp since lambdas run on a read only file system:

service: nb-exec
frameworkVersion: '3'
provider:
  name: aws

functions:
  hello:
    handler: handler.hello
    environment:
      IPYTHONDIR: /tmp/ipythondir

plugins:
  - serverless-python-requirements

custom:
  pythonRequirements:
    fileName: requirements.txt
    dockerizePip: true

package:
  patterns:
    - "!.venv/**"
    - "!node_modules/**"
Enter fullscreen mode Exit fullscreen mode

Our requirements.txt file, which we will use to execute the notebook files:

nbconvert===7.9.2
ipython===8.16.1
ipykernel===6.25.2
Enter fullscreen mode Exit fullscreen mode

Next, inside our handler:

import os
import shutil
import subprocess
import uuid


def hello(event, context):
    unique_id = str(uuid.uuid4())

    workspace_path = os.path.join(os.path.abspath(os.sep), "tmp", f"workspace_{unique_id}")
    if not os.path.exists(workspace_path):
        os.makedirs(workspace_path)

    shutil.copy("execute.sh", workspace_path)

    notebook_dir_path = os.path.join(workspace_path, "notebook")
    os.makedirs(notebook_dir_path, exist_ok=True)

    shutil.copy("example.ipynb", notebook_dir_path)

    execute_script_path = os.path.join(workspace_path, "execute.sh")
    subprocess.run(["bash", execute_script_path], cwd=workspace_path)
Enter fullscreen mode Exit fullscreen mode

And finally the execute.sh file:

# Make sure dependencies can be picked up from the deployment directory, as well as the
# built in AWS runtime dependencies.
export PYTHONPATH=$LAMBDA_TASK_ROOT:$LAMBDA_RUNTIME_DIR

# Create a virtual environment that inherits these dependencies.
python3 -m venv .venv --system-site-packages
source .venv/bin/activate

python3 -m nbconvert --to notebook --execute ./notebook/example.ipynb
Enter fullscreen mode Exit fullscreen mode

One unsolved additional problem is the following error when installing dependencies from within a cell:

!pip install pandas
Error: out of pty devices
Enter fullscreen mode Exit fullscreen mode

But replacing this with the following seems to work fine:

subprocess.run(["pip", "install", "pandas"])
Enter fullscreen mode Exit fullscreen mode

Note, running untrusted code in a lambda environment is not secure as each invocation may have access to other invocations or AWS resources.

Top comments (0)