A/B testing doesn't offer much unless you get some useful data out of it, and I needed a way to calculate how statistically significant the results of an A/B test were. Python is one of the obvious choices for this kind of work. But, Preferr is written in Ruby and has a Rails based GraphQL API, so I couldn't just shove in some Python and have it work.
And then a little voice in the back of my head whispered, "Hey remember all those podcasts and articles about serverless? You should try it."
Getting started with Serverless
The setup
As someone who had no idea where to start with serverless, I googled serverless and, what do you know, serverless.com popped up. With a name like that, surely it was the right place to start.
I signed up, followed some set up instructions, and then got to work on setting up Python and a virtual environment.
Note:
If you're running macOS and you're installing Python 3 via homebrew like I did, be warned that it might screw up OpenSSL and really ruin your day. To unruin your day, you can run brew reinstall openssl@1.1
or check this stackoverflow thread with all sorts of other possible ways to solve the problem.
Generating a new serverless project
Serverless comes with a very nice CLI that allows you to quickly generate some boilerplate for a ton of different templates.
I used the python3 template:
serverless create --template aws-python3 --name my-special-service --path serverless-project
That generated 3 files: .gitignore
, handler.py
, and serverless.yml
.
Each of those comes with some generated code that can be used to take the function for a little test run.
Installing Python libraries
Among the many Serverless guides and tutorials is one about managing python packages. I referenced it frequently.
Note: Their tutorial uses virtualenv
for python, but I used pipenv
.
I needed the numpy
and scipy
packages, so I started a pipenv shell and installed them.
pipenv shell
pipenv install numpy
pipenv install scipy
That generated a Pipfile
and Pipfile.lock
in the project and I was now free to use those packages in handler.py
.
The handler.py file
The handler.py
file in a Serverless project is where the logic for your function(s) lives.
The generated output from serverless create
is this:
import json
def hello(event, context):
body = {
"message": "Go Serverless v1.0! Your function executed successfully!",
"input": event
}
response = {
"statusCode": 200,
"body": json.dumps(body)
}
return response
# Use this code if you don't use the http event with the LAMBDA-PROXY
# integration
"""
return {
"message": "Go Serverless v1.0! Your function executed successfully!",
"event": event
}
"""
Obviously, this file needed to be modified to do some real work.
Here's what the end result of handler.py
looks like for Preferr:
try:
import unzip_requirements
except ImportError:
pass
import json
# from serverless_sdk import tag_event
import numpy as np
from scipy.stats import chi2_contingency
from scipy.stats import fisher_exact
def chiValue(event, context):
contingency_table = json.loads(event["body"])
statistic = chi2_contingency(contingency_table['contingencyTable'])
headers = {
"Access-Control-Allow-Origin": "*",
}
body = {
"chi2": statistic[0],
"pValue": statistic[1],
"input": event
}
response = {
"statusCode": 200,
"headers": headers,
"body": json.dumps(body)
}
return response
def fishersExact(event, context):
contingency_table = json.loads(event["body"])
statistic = fisher_exact(contingency_table['contingencyTable'])
headers = {
"Access-Control-Allow-Origin": "*",
}
body = {
"pValue": statistic[1],
"input": event
}
response = {
"statusCode": 200,
"headers": headers,
"body": json.dumps(body)
}
return response
The most critical parts of this file are at the top with the imports. To understand why, I'll take you to the serverless.yml
file.
The serverless.yml file
serverless.yml
is the configuration file that tells Serverless how your function operates within Serverless itself and how it should be deployed to AWS Lambda. It establishes what cloud provider you're using, what Serverless plugins are needed, what functions can be run, and a lot more. Here's a full reference for options in this file when using aws
as your provider: serverless.yml Reference
I spent the majority of my time tweaking this file to get it right. I deployed at least a hundred times over the course of a few days as I debugged and tried different settings.
These are the settings that ended up being the most important for me.
plugins:
- serverless-python-requirements
This plugin automatically bundles Python libraries found in requirements.txt
or Pipfile
for use in the deployed function(s).
To install it, run npm init
in your project and follow the directions to create a package.json
file.
After that, run npm install --save serverless-python-requirements
.
custom:
pythonRequirements:
dockerizePip: non-linux
zip: true
This tells the serverless-python-requirements
plugin how it should bundle up dependencies like numpy
and scipy
.
dockerizePip
uses Docker to package these dependencies and it's essential to get things working on AWS Lambda. To quote the Serverless docs:
"Docker packaging is essential if you need to build native packages that are part of your dependencies like Psycopg2, NumPy, Pandas, etc."
functions:
chiValue:
handler: handler.chiValue
events:
- http:
path: /chi-value
method: post
fishersExact:
handler: handler.fishersExact
events:
- http:
path: /fishers-exact
method: post
And this defines the functions that will be given HTTP endpoints for you to call. Serverless will give you a URL that you can find on your dashboard.
Here's how Preferr's serverless.yml
file ended up:
service: preferr-stats-chisquared
app: preferr-stats
org: ellitt
provider:
name: aws
runtime: python3.6
stage: prod
region: us-east-1
plugins:
- serverless-python-requirements
package:
individually: true
exclude:
- venv/**
- .vscode/**
custom:
pythonRequirements:
dockerizePip: non-linux
zip: true
functions:
chiValue:
handler: handler.chiValue
events:
- http:
path: /chi-value
method: post
fishersExact:
handler: handler.fishersExact
events:
- http:
path: /fishers-exact
method: post
An Important Modification
The guide that I followed from Serverless got me about 80% of the way towards having a usable Lambda function on AWS, but I did have to dig and figure out why my deploys kept getting rejected.
Adding zip: true
to pythonRequirements
zip: true
makes the deployed package much smaller and prevents Lambda from rejecting it due to space constraints. numpy
and scipy
are both large libraries, so they need to be compressed in order to work.
zip: true
also requires a change to handler.py
as well.
try:
import unzip_requirements
except ImportError:
pass
This will unzip the compressed libraries so that they can be used in handler.py
.
Deployment
After spending a few days sorting all of this out, I deployed the function and gave it a try.
serverless deploy
And it deployed successfully! Now, I had the function on AWS Lambda via Serverless, but it wasn't worth much if I wasn't making any requests to it.
I'll cover making those requests in the next post.
Top comments (1)
Amazing article! Keep it up 👏🔥