Introduction
In the previous article, I explained the benefits of using Sagemaker for training models on a local server, which can be found in the article "Why Choose Sagemaker Despite Having a Local Server with RTX3080?".
In this article, I will first present a simple example to demonstrate the process of training and deploying models locally using Sagemaker.
Then, I will share my experience with a LSTM futures trading project to explain the best practices for using real-time endpoints and batch-transform endpoints.
Finally, based on my experience with the LSTM futures trading project, I will explain which Sagemaker Instance / Fargate / EC2 should be selected for deployment.
Sagemaker Exec - Training and Deploying Models Locally
0.0 Prerequisite:
Before starting local development, please install the following:
- Nvidia CUDA (https://developer.nvidia.com/cuda-downloads)
- Nvidia-container-toolkit (https://github.com/NVIDIA/nvidia-container-toolkit)
- Docker (https://docs.docker.com/engine/install/)
1.0 Install Docker Local Development Image
# Copyright (c) Jupyter Development Team.
# Distributed under the terms of the Modified BSD License.
ARG REGISTRY=quay.io
ARG OWNER=jupyter
ARG BASE_CONTAINER=$REGISTRY/$OWNER/scipy-notebook
FROM $BASE_CONTAINER
USER root
LABEL maintainer="Jupyter Project <jupyter@googlegroups.com>"
RUN apt-get -y update && apt-get install -y --no-install-recommends \
ca-certificates \
curl \
gnupg
RUN install -m 0755 -d /etc/apt/keyrings
RUN curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
RUN chmod a+r /etc/apt/keyrings/docker.gpg
RUN echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
RUN apt-get update
RUN apt-get install -y \
docker-ce \
docker-ce-cli \
containerd.io \
docker-buildx-plugin \
docker-compose-plugin
# Fix: https://github.com/hadolint/hadolint/wiki/DL4006
# Fix: https://github.com/koalaman/shellcheck/wiki/SC3014
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
# Install Tensorflow with pip
RUN pip install --no-cache-dir tensorflow[and-cuda] && \
fix-permissions "${CONDA_DIR}" && \
fix-permissions "/home/${NB_USER}"
# Install sagemaker-python-sdk with pip
RUN pip install --no-cache-dir 'sagemaker[local]' --upgrade
1.1 Use the jupyter/tensorflow-notebook
development environment
(https://github.com/jupyter/docker-stacks/blob/main/images/tensorflow-notebook/Dockerfile)
1.2 Modify the jupyter/tensorflow-notebook
image to install docker
and sagemaker[local]
inside the image
docker build -t sagemaker/local:0.1 .
1.3 Create the local development image
sudo docker run --privileged --name jupyter.sagemaker.001 --gpus all -e GRANT_SUDO=yes --user root --network host -it -v /home/jovyan/work:/home/jovyan/work -v /sagemaker:/sagemaker -v /var/run/docker.sock:/var/run/docker.sock -v /tmp:/tmp -v /sagemaker:/sagemaker sagemaker/local:0.2 >> /home/jovyan/work/log/sagemaker_local_date +\%Y\%m\%d_\%H\%M\%S.log 2
1.4 Start the local development image
1.5 -v /home/jovyan/work
, this is the default path for jupyter/tensorflow-notebook
1.6 -v /var/run/docker.sock
, used to start the Sagemaker's train & inference image
1.7 -v /tmp
, this is the temporary file path for Sagemaker
1.8 Go to 127.0.0.1:8888
2.0 Sagemaker Local Training of Models
import os
os.environ['AWS_DEFAULT_REGION'] = 'AWS_DEFAULT_REGION'
os.environ['AWS_ACCESS_KEY_ID'] = 'AWS_ACCESS_KEY_ID'
os.environ['AWS_SECRET_ACCESS_KEY'] = 'AWS_SECRET_ACCESS_KEY'
os.environ['AWS_ROLE'] = 'AWS_ROLE'
os.environ['INSTANCE_TYPE'] = 'local_gpu'
2.1 Set AWS IAM
and INSTANCE_TYPE
import keras
import numpy as np
from keras.datasets import fashion_mnist
(x_train, y_train), (x_val, y_val) = fashion_mnist.load_data()
os.makedirs("./data", exist_ok = True)
np.savez('./data/training', image=x_train, label=y_train)
np.savez('./data/validation', image=x_val, label=y_val)
2.2 Download datasets (training set and validation set)
from sagemaker.tensorflow import TensorFlow
training = 'file://data'
validation = 'file://data'
output = 'file:///tmp'
tf_estimator = TensorFlow(entry_point='fmnist.py',
source_dir='./src',
role=os.environ['AWS_ROLE'],
instance_count=1,
instance_type=os.environ['INSTANCE_TYPE'],
framework_version='2.11',
py_version='py39',
hyperparameters={'epochs': 10},
output_path=output,
)
tf_estimator.fit({'training': training, 'validation': validation})
2.3 Download fmnist.py
and model.py
to ./src
(https://github.com/PacktPublishing/Learn-Amazon-SageMaker-second-edition/tree/main/Chapter%2007/tf)
2.4 Start local training of models. Sagemaker launches the image 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training:2.11-gpu-py39
.
3.0 Sagemaker Local Deployment of Models
import os
from sagemaker.tensorflow import TensorFlowModel
model = TensorFlowModel(
entry_point='inference.py',
source_dir='./src',
role=os.environ['AWS_ROLE'],
model_data=f'{output}/model.tar.gz',
framework_version='2.11'
)
predictor = model.deploy(
initial_instance_count=1,
instance_type=os.environ['INSTANCE_TYPE'],
)
3.1 Download inference.py
to ./src
(https://github.com/aws/sagemaker-tensorflow-serving-container/blob/master/test/resources/examples/test1/inference.py)
3.2 Create the Tensorflow-serving image. Sagemaker launches the image 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.11-gpu
4.0 Invoke the Tensorflow-Serving:8080 interface
import random
import json
import matplotlib.pyplot as plt
num_samples = 10
indices = random.sample(range(x_val.shape[0] - 1), num_samples)
images = x_val[indices]/255
labels = y_val[indices]
for i in range(num_samples):
plt.subplot(1,num_samples,i+1)
plt.imshow(images[i].reshape(28, 28), cmap='gray')
plt.title(labels[i])
plt.axis('off')
payload = images.reshape(num_samples, 28, 28, 1)
response = predictor.predict(payload)
prediction = np.array(response['predictions'])
predicted_label = prediction.argmax(axis=1)
print('Predicted labels are: {}'.format(predicted_label))
4.2 Run the model
print('About to delete the endpoint')
predictor.delete_endpoint(predictor.endpoint_name)
4.3 Close the Tensorflow-serving image
5.0 External Invocation of Tensorflow-serving:8080 interface
5.1 Go to the real-time endpoint (http://YOUR-SEGAMAKER-DOMAIN:8080/invocations)
5.2 [Post] Body -> raw, input json data
Conclusion of Sagemaker Exec
This is a simple example demonstrating the process of training and deploying models locally using Sagemaker. As mentioned earlier, since Sagemaker does not fully support local development, it is necessary to modify the jupyter/tensorflow-notebook
image. Additionally, a more complex inference.py
is required for local model deployment.
However, I still recommend using Sagemaker for local development because it provides pre-built resources and clean code. Moreover, Sagemaker has preconfigured workflows for training and deploying model images, so we do not need to deeply understand the project structure and internal operations to complete the training and deployment of models.
When to use real-time endpoints and batch-transform endpoints
The choice of endpoint depends not only on cost factors but also on business logic, such as response time, frequency of Invocation, dataset size, model update frequency, error tolerance, etc. I will present two practical use cases to explain the best use of real-time endpoints
and batch-transform endpoints
.
- SageMaker batch transform is designed to perform batch inference at scale and is cost-effective.
- SageMaker real-time endpoints aim to provide a robust live hosting option for your ML use cases.
Getting-Started-with-Amazon-SageMaker-Studio, chapter07
Here are two examples of trading strategy:
1. Diana's medium-term quarterly trading strategy
The multi-asset portfolio includes US stocks, overseas stocks, US coupon bonds, overseas high-yield bonds, and 3-month bills. Every 3 months, the LSTM-all-weather-portfolio model is used for asset rebalancing. This model runs once a day, 15 minutes before market close, to check the risk of each position and whether the portfolio meets the 5% annualized return.
2. Alice's intraday futures trading strategy
Trading only S&P 500 index and Nasdaq index futures, with a holding period of approximately 30 minutes to 360 minutes. The LSTM-Pure-Alpha-Future model uses 20-second snapshot data to provide buy and exit signals. These signals are stored for daily performance analysis of the model.
Diana's Medium-Term Quarterly Trading Strategy
- Assets: Stocks, Bonds, Bills
- Instrument Pool: US stocks, Overseas stocks, US coupon bonds, Overseas high-yield bonds, 3-month bills
- Trading Frequency: 5 trades per quarter
- Response Time: Time Delayed. Only required 15 minutes before market close
- Model: LSTM-all-weather-portfolio
- Model Update Frequency: Low. Update the model only if it achieves a 5% annualized return
-
Recommended Solution:
Batch-transform endpoint
If the dataset is large and response time can be delayed, the Batch-transform endpoint
should be used.
Alice's Intraday Futures Trading Strategy
- Assets: Index Futures
- Instrument Pool: SP500 index Future, Nasdaq Index Future
- Trading Frequency: 5 trades per day
- Response Time: Real-time
- Model: LSTM-Pure-Alpha-Future
- Model Update Frequency: High. Always optimization of buy and exit signals
-
Recommended Solution:
Real-time endpoint
If the dataset is small and response time needs to be fast, the Real-time endpoint
should be used.
Even though Sagemaker provides various deployment benefits, why do I still use EC2?
In my current role at a financial technology company, I am always excited about innovative products. AWS's innovative products bring surprising solutions. If I were to create a personal music brand, I would choose AWS's new products such as DeepComposer, Fargate, Amplify, Lambda, etc.
However, the cost of migrating to the cloud is high. Additionally, there is no significant incentive to migrate existing hardware resources to the cloud. Here are my use cases to explain why I choose EC2:
1. Custom Python financial engineering library
Although I prefer to use frameworks and libraries, there are some special requirements that require the use of a custom Python financial engineering library, such as developing high dividend investment strategies, macro cross-market analysis, and so on. Therefore, I manage Docker images. Thus, the pre-built images provided by Sagemaker cannot fully meet my needs, and instead, EC2 offers more freedom to structure the production environment.
2. Team development and custom CI/CD workflow
Although Sagemaker allows for quick training and deployment of models, it does not fully meet my development needs. We have an independent development team responsible for researching trading strategies and developing deep learning trading models. Due to our custom CI/CD workflow, it is not suitable to overly rely on Sagemaker for architecture.
3. Pursuit of controlled fixed costs
Although Sagemaker and Fargate allow for quick creation of instances, the cost is based on CPU utilization. Therefore, I prefer EC2 with fixed costs and manually scale up when resources are insufficient.
Conclusion
Sagemaker is a remarkable product. For startup companies looking to launch new products, AWS's cloud solution is the preferred choice. Even for mature enterprises, leveraging AWS cloud services can optimize workflow. In summary, I highly recommend incorporating Sagemaker into the development process.
Top comments (2)
great real world use case of sega maker
grateful for your comment.