Companies who adopt a cloud-native approach realize its value when they marry compute capacity.
But for those companies born before the cloud, in many cases even before generalized computers,
to build on-premises, how can they realize the value of newly launched cloud services when the early.
AWS proposes a series of tenets to guide our discussion of a world-class hybrid machine learning experience:
- Seamless management experience.
- Tunable latency.
- Fast time-to-value.
- Low cost.
- End in the cloud
This document follows the machine learning lifecycle, from development to training and deployment.
What is Hybrid?
We look at hybrid architectures as having a minimum of two compute environments, what we will call “primary” and “secondary” environments. Generally speaking, we see the primary environment as where the workload begins, and the secondary environment is where the workload ends.
What “Hybrid” is not?
While either your primary or secondary environment may be, in fact, another cloud provider, we prefer not to explicitly discuss those patterns here. That’s because the technical details and capabilities around specific services, features, and global deployment options are not necessarily the same across all cloud providers, and we consider that type of analysis outside the scope for this document.
Hybrid patterns for development
Development refers to the phase in machine learning when customers are iteratively building models. This may or may not include exploratory data analysis, deep learning model development,
Generally speaking, there are two major options for hybrid development:
1- laptop and desktop personal computers.
2- Self-managed local servers utilizing specialized GPUs, colocations, self-managed racks, or corporate data centers. Customers can develop in one or both of these compute environments, and below we’ll describe hybrid patterns for model deployment using both of these.
Develop on personal computers, to train and host in the cloud
Customers can use local development environments, such as PyCharm or Jupyter installations on their laptops or personal computer, and then connect to the cloud via AWS Identity and Access Management (IAM) permissions and interface with AWS service API’s through the AWS CLI or an AWS SDK (ex boto3).
1- You have full control of your IDE in this scenario. You just have to open up your computer to get started.
2- You can easily manage what’s sitting in your S3 bucket, vs what’s running on your local laptop.
3- You iteratively write a few lines of code in your complex models, you check them locally, and you only land in the cloud to scale/track/deploy.
1- Inability to scale beyond the compute resources of your laptop.
2- Lack of access to GUI-centric features like Autopilot, Data Wrangler, Pipelines.
3- If your laptop dies and you didn’t back up externally, your work is gone! Difficulty in onboarding non-super user employees can increase over time as software, OS, and hardware versions change.
4- This onboarding difficulty gets more and more painful as time goes by, in some worst-case scenarios, it leads to highly valued employees not getting access to Python or Pandas for multiple months!
When to move:
If you find yourself spending a significant portion of your time managing local compute environments, it’s time to move to the cloud.
Develop on local servers, to train and host in the cloud
1- Ability to capitalize on previous investments in local compute.
2- Simplifies infrastructure setup required to meet some regulatory requirements, such as those for specialized industries (health, gaming, financial).
3- Lower price point per compute on some compute environments, including some advanced GPU’s
4- Ideal for non-bursting, stable computes with high precision in the 6+ month forecast.
1- A fundamental challenge to dynamically provision compute resources with the needs of your business, leading teams to frequently be stuck in either overutilization of local compute resources or under-utilization.
2- Expensive GPUs can take months to ship, which leads to a larger total cost of ownership. New ideas and features can take longer to launch, because of extra effort to develop.
When to move
1- When you spend more time managing your local development than you do working on new data science projects.
2- When the multiple months it takes to procure, wait for, and provision additional compute resources leave your development teams sitting idle.
Hybrid patterns for training
Usually, a hybrid pattern for training comes down to one of two paths.
1- Either you are training locally, and you want to deploy in the cloud.
2- Or you have all of your data sitting on local resources, and you want to select from that to move into the cloud to train.
Training locally, to deploy in the cloud
1- First, if you are training locally then you will need to acquire the compute capacity to train a model.
2- After your model is trained, there are two common approaches for packaging and hosting it in the cloud.
- Docker – using a Docker file you can build your custom image that hosts your inference script, model artifact, and packages. Register this image in the Elastic Container Registry (ECR), and point to it from your SageMaker estimator.
- Another option is using the pre-built containers within the SageMaker Python SDK, also known as the deep learning containers (or DL AMI’s). Bring your inference script and custom packages, upload your model artifact to Amazon S3, and import an estimator for your framework of choice. Define the version of the framework you need in the estimator, or install it directly with a requirements.txt file or a custom bash script.
How to monitor your model in the cloud
A key feature for hosting is model monitor, or the ability to detect data, bias, feature, and model thresholds, trigger a re-training pipeline.
1- Upload your training data to an Amazon S3 bucket, and use our pre-built image to learn the upper and lower bounds of your training data.
2- you will receive a JSON file with the upper and lower statistically recommended bounds for each feature. You can modify these thresholds.
3- After confirming your thresholds, schedule monitoring jobs in your production environment. These jobs run automatically, comparing your captured inference requests in Amazon S3 with your thresholds.
4- You will receive CloudWatch alerts when your inference data is outside of your pre-determined thresholds, and you can use those alerts to trigger a re-training pipeline.
How to handle retraining/retuning
1- It is easy to run a retraining and retuning job in the cloud without the overhead of provision, scheduling, and managing your physical resources around this job.
2- SageMaker makes train and tuning jobs easy to manage because all you need to bring is your training script and dataset.
3- Follow best practices for training on SageMaker, ensuring your new dataset is loaded into an Amazon S3 bucket or other supported data source.
4- Another key feature of hosting models in SageMaker is multi-model endpoints.
5- Define your inference script, ensuring the framework is supported by SageMaker multi-model.
6- Create the multi-model endpoint, pointing to Amazon S3, and load your model artifacts into the SageMaker endpoint, invoking the endpoint with the name of the model you want to use.
How to serve thousands of models in the cloud at a low cost
Another key feature of hosting models in SageMaker is multi-model endpoints. SageMaker endpoint, invoking the endpoint with the name of the model you want to use.
1- have more control over your training environment.
2- the cloud not only provides greater flexibility, but can increase a firm’s security posture, by freeing up resources from physical security, patching, and procurement.
1- Not taking advantage of cost savings on spot instances.
2- Not using pre-built Docker images, but potentially wasting engineering effort developing these from scratch.
3- Not using advanced distributed training toolkits or custom hardware like the upcoming training.
4- Not using prebuilt tuning packages, but need to build or buy your tuning package.
5- Not using the debugger, profiler, feature store, and other training benefits.
When to move:
1- When the cost of developing your local training platform exceeds its use,
2- Also when the time to provision additional computes resources is far outstripped by the demand for training by your data science resources or business needs.
Storing data locally, to train and deploy in the cloud
Schedule data transfer jobs with AWS DataSync
1- AWS DataSync is a data transfer service that simplifies, automates, and accelerates moving data between on-premises storage systems and AWS storage services, as well as between AWS storage services.
2- Using AWS DataSync you can easily move petabytes of data from your local on-premises servers up to the AWS cloud.
3- AWS DataSync connects to your local NFS resources, looks for any changes, and handles populating your cloud environment.
Migrating from Local HDFS
you might wholly embrace HDFS as your center and move towards hosting it within a managed service, Amazon Elastic Map Reduce (EMR).
If you are interested in learning how to migrate from local HDFS clusters to Amazon EMR, please see this migration guide: https://d1.awsstatic.com/whitepapers/amazon_emr_migration_guide.pdf
1- Use Amazon S3 intelligent tiering for objects over 128 KB.
2- Use multiple AWS accounts, and connect them with Organizations.
3- Set billing alerts.
4- Enable SSO with your current Active Directory provider.
5- Turn on Studio!
1- This is a fast way to realize the value of your locally stored datasets, particularly during a cloud migration.
2- Training and developing in the cloud gives you access to fully managed features within Amazon SageMaker and the entire AWS cloud.
3- You can offload your on-premises resources more easily by leveraging capabilities in the cloud.
4- This frees up your teams from procuring, provisioning, securing, and patching local compute resources, enabling them to dynamically scale these with the needs of your business.
5- Generally speaking, you can deploy more models more quickly by training and hosting in the cloud.
1- Expending more resources storing data locally than potentially necessary.
2- If you intend to train your ML models locally, you should anticipate a high volume of node drop-outs in your data centers. One large job can easily consume 1TB of RAM while another can require smaller memory, but execute for potentially days.
3- Cost mitigation can be important here. Customers should be aware of any data duplication across environments and take action to reduce costs aggressively.
When to move
1- When the cost of managing, securing, and storing your data on-premises exceeds the cost of archiving and storing it in the cloud.
Develop in the cloud while connecting to data hosted on-premises
Data Wrangler & Snowflake
- Data Wrangler enables customers to browse and access data stores across Amazon S3, Amazon Athena, Amazon Redshift, and 3rd party data warehouses like Snowflake.
- This hybrid ML pattern provides customers the ability to develop in the cloud while accessing data stored on-premises, as organizations develop their migration plans.
Train in the cloud, to deploy ML models on-premises
If you are deploying on-premises, you need to develop and host your local webserver. We strongly recommend you decouple* hosting your model artifact from your application.
1- Amazon SageMaker lets you specify any type of model framework, version, or output artifact you need to.
2- You’ll find all model artifacts wrapped as tar.gz archives after training jobs, as this compressed file format saves on job time and data costs.
3- If you are using your own image, you will need to own updating that image as the software version, such as TensorFlow or PyTorch, undergoes potentially major changes over time.
4- Lastly, keep in mind that it is an unequivocal best practice to decouple hosting your ML model from hosting your application.
A key step in your innovation flywheel is that once you use dedicated resources to host your ML model, specifically ones that are separated from your application, this greatly simplifies your process to push better models.
1- Can use SageMaker Neo to compile your model for a target device • Feels like you have more control up front.
2- Taking advantage of cloud-based training and tuning features, like spot, debugger, model and data-parallel, Bayesian optimization, etc.
3- Can enable your organization to progress on their cloud migration plan while the application development moves to the cloud.
1- Need to develop, manage, maintain, and respond to operational issues with locally managed web servers.
2- Own the burden of building and maintaining up-to-date versions of model software frameworks, such as TensorFlow or PyTorch.
3- Bigger risk of tightly coupling compute for your model with computing for your application, making it more challenging for you to deploy new models and new features to your application over time.
4- Need to develop your own data-drift detection and model monitor capabilities.
5- Not taking advantage of cloud-based features for global deployment, see next section for more details.
6- Need to develop your own application monitoring pipeline that extracts key metrics, business details, and model responses, to share with business and data science stakeholders.
When to move
1- When your ability to deploy new applications on-premises is hindered by your need to procure, provision, and manage local infrastructure.
2- When your model is losing accuracy and/or performance over time, due to your inability to quickly retrain and redeploy.
3- When the cost of monitoring, updating, maintaining, and troubleshooting.
Monitor ML models deployed on-premises with SageMaker Edge Manager
- SageMaker Edge Manage makes it easy for customers to manage ML models deployed on Windows, Linux, or ARM-based compute environments.
- Install the edge manager agent onto the CPU of your intended device, and leverage AWS IOT Core or another transfer method to download the model to the device, and execute local inferencing.
- Edge Manage simplifies the monitoring and updating of these models by bringing the control plane up to the cloud.
- You can bring your own monitoring algorithm to the service and trigger retraining pipelines as necessary, using the service the redeploy that model back down to the local device.
Hybrid patterns for deployment
In this pattern, we focus mostly on hosting the model in the cloud but interacting with applications that may be hosted on-premises.
Hosting in the cloud to applications on-premises can enable the data scientists, while patterns for hosting ML models via Lambda at the Edge, Outposts, Local Zones, and Wavelength.
Serve models in the cloud to applications hosted on-premises
The most common use case for a hybrid pattern like this is enterprise migrations.
- Are deploying ML models to application consumers.
- Can use custom hardware for ultra-low response times with AWS Inferentia.
- Can serve thousands of models on the cheap with multi-model endpoints.
- Can deploy complex feature transforms with inference pipelines.
- Can use built-in autoscaling and model monitor.
- Can easily develop retrain and tune pipelines.
- Risk of your local application infrastructure maintenance hindering the speed of your model development.
When to move: When your ability to deploy new applications on-premises is hindered by your need to procure, provision, and manage local infrastructure.
Host ML Models with Lambda at Edge to applications on-premises
1- This pattern takes advantage of a key capability of the AWS global network – the content delivery network known as Amazon CloudFront.
2- Once you’ve set your Lambda function to trigger off of CloudFront, you’re telling the service to replicate that function across all available regions and points of presence. This can take up to 8 minutes to replicate and become available.
1- Can use CloudFront, opening you up to serving on hundreds of points of presence around the world, and saving you from having to manage these.
2- Works nicely with Docker images on SageMaker, because you can create from ECR.
- Can’t use GPU’s, so you may introduce a bit of latency in some cases, particularly where customers may be better served by an ML model on Inferentia hosted in a nearby AWS Region.
- Lambda has a hard limit on the largest amount of memory you can allocate to a function, which is 10.24GB. For many “classical” ML models, such as XGBoost or linear regressors, 10GB is more than sufficient. However, for some more complex deep learning models, especially those in the 10’s to 100’s of billions of parameters, 10GB is woefully short in terms of RAM.
When to move
1- When you need more advanced drift and monitoring capabilities.
2- When you want to introduce complex feature transforms, such as with inference pipelines on SageMaker.
3- When you want to serve thousands of models per use case.
Training with a 3rd party SaaS provider to host in the cloud
1- Ensure your provider allows the export of proprietary software frameworks, such as jars, bundles, images, etc. Follow w steps to create a Docker file using that software framework , port into the Elastic Container Registry, and host on SageMaker.
Control plane patterns for hybrid ML
- One such common control plane is Kubeflow in conjunction with EKS Anywhere. EKS Anywhere is currently in private preview, anticipated to come online in 2021.
- SageMaker offers a native approach for workflow orchestration, known as SageMaker Pipelines. SageMaker Pipelines is ideal for advanced SageMaker users, especially those who are already onboarded to the IDE SageMaker Studio.
Auxiliary services for hybrid ML patterns
1- Order AWS Outposts and Amazon will ship, install, and manage these resources for you. You can connect to these resources however you prefer, and manage them from the cloud.
2- You can deploy ML models via ECS to serve inference with ultra-low latency within your data centers, using AWS Outposts. You can also use ECS for model training, integrating with SageMaker in the cloud and ECS on Outposts.
3- Outposts help solve cases where customers want to build applications in countries where there is not currently an AWS Region, or for regulations that have strict data residency requirements, like online gambling and sports betting.
- AWS Inferentia provides ease of accessing custom hardware for ML inferencing.
- The majority of Alexa operates in a hybrid ML pattern, hosting models on AWS Inferentia and serving hundreds of millions of Alexa-enabled devices around the world. Using AWS Inferentia, Alexa was able to reduce its cost of hosting by 25%.
- You can use SageMaker’s managed deep learning containers to train your ML models, compile them for Inferentia with Neo, host on the cloud, and develop your retrain and tune pipeline as usual.
AWS Direct Connect
Ability to establish a private connection between your on-premises resources and your data center. Remember to establish a redundant link, as wires do go south!
Hybrid ML Use Cases
1- Enterprise Migrations
4- Mobile application development
5- AI-enhanced media and content creation
6- Autonomous Vehicles
In this document, we explored hybrid ML patterns across the entire ML lifecycle. We looked at developing locally while training and deploying in the cloud. We discussed patterns for training locally to deploy on the cloud and even to host ML models in the cloud to serve applications on-premises.
Top comments (1)
Great Achievement as usual as past
hope best of luck to you