DEV Community

Cover image for Machine learning -AWS ML
selvakumar palanisamy
selvakumar palanisamy

Posted on

Machine learning -AWS ML

Machine Learning

There is an increasing focus on data and machine learning these days.Machine learning helps us to make informed decision and now we have data and scientific methods to derive more intelligent and accurate forecasts.

Machine learning provides systems the ability to automatically learn and improve from experience without being explicitly programmed. AWS ML is a robust,cloud-based service that makes it easy for developers of all skill levels to use machine learning technology.

Steps to build an ML model.

1.Gather the dataset.

2.Perform data analysis to produce a low dimensional 
representation of a higher dimensional data set using 
unsupervised machine learning techniques like PCA and SOM.

3.Data cleaning and curation to get the pre-processed 
data. 

4.Split the data set into two  
   i) Training data set (80% of the pre-processed dataset) 
   ii) Test data set (20% of the pre-processed dataset)

5.Apply machine learning algorithms to get the validation 
  model and test model.

6.Derive the prediction value from the Model.

7.Evaluate the model performance using confusion matrix 
 and other ML formulas.

8.Re-iterate through this process until you get sufficient 
 accuracy for the prediction.
9.Deploy the model to use it in Production (Enterprise 
 level)
Enter fullscreen mode Exit fullscreen mode

image

AWS ML

Amazon ML supports three types of ML models.

Binary classification

Multi-class classification

Regression

Both Binary classification and Multi-class classification comes under supervised learning and Regression comes under unsupervised learning.

Binary classification 
Predict values that can only have two categories such as true or false.

Multi-class classification
Predict values that belong to limited, predefined categories.

Regression (squared loss function + SGD)
Predict a numeric value.

Below table provides details about the ML algorithm's

image

AWS ML STACK

Amazon SageMaker, machine-learning platform that takes care of abstracting a ton of software development skills necessary to accomplish the task.

SageMaker supports frameworks like TensorFlow, PyTorch, Apache MXNet, Chainer, Keras, Gluon, Horovod, Scikit-learn, and Deep Graph Library.

Amazon SageMaker Studio, the fully integrated development environment (IDE) for machine learning and helps developers to set up an end-to-end ML solution.

image

Simple Machine learning Pipeline

Automated ml pipeline can be created using the AWS code pipeline and Sagemaker services.

Build: Compile and build your custom source application code and output a docker image.

Training: Run training on the docker image with SageMaker API and output model artefacts to S3 bucket.

image

Sagemaker Jupyter notebook to build and deploy, train the model

Let's see how to build the sagemaker jupyter notebook to solve the common Bike-sharing demand prediction problem.

Problem : You are provided hourly rental data spanning two years. The training set is comprised of the first 19 days of each month, while the test set is the 20th to the end of the month. You must predict the total count of bikes rented during each hour covered by the test set, using only information available prior to the rental period.

image

First Import Libraries

import numpy as np
import pandas as pd
import boto3
import re
import sagemaker
from sagemaker import get_execution_role
Enter fullscreen mode Exit fullscreen mode

Upload Data to S3

# Specify your bucket name
bucket_name = 'selva-ml-sagemaker'

training_folder = r'bikerental/training/'
validation_folder = r'bikerental/validation/'
test_folder = r'bikerental/test/'

s3_model_output_location = r's3://{0}/bikerental/model'.format(bucket_name)
s3_training_file_location = r's3://{0}/{1}'.format(bucket_name,training_folder)
s3_validation_file_location = r's3://{0}/{1}'.format(bucket_name,validation_folder)
s3_test_file_location = r's3://{0}/{1}'.format(bucket_name,test_folder)

def write_to_s3(filename, bucket, key):
    with open(filename,'rb') as f: # Read in binary mode
        return boto3.Session().resource('s3').Bucket(bucket).Object(key).upload_fileobj(f)

write_to_s3('bike_train.csv', 
            bucket_name,
            training_folder + 'bike_train.csv')

write_to_s3('bike_validation.csv',
            bucket_name,
            validation_folder + 'bike_validation.csv')

write_to_s3('bike_test.csv',
            bucket_name,
            test_folder + 'bike_test.csv')
Enter fullscreen mode Exit fullscreen mode

Build

use_spot_instances = True
max_run = 3600 # in seconds
max_wait = 7200 if use_spot_instances else None # in seconds

job_name = 'xgboost-bikerental-v1'

checkpoint_s3_uri = None

if use_spot_instances:
    checkpoint_s3_uri = f's3://{bucket_name}/bikerental/checkpoints/{job_name}'

print (f'Checkpoint uri: {checkpoint_s3_uri}')

sess = sagemaker.Session()
role = get_execution_role()
container = sagemaker.image_uris.retrieve("xgboost",sess.boto_region_name,version="1.2-2")

estimator = sagemaker.estimator.Estimator(
    container,
    role,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    output_path=s3_model_output_location,
    sagemaker_session=sess,
    base_job_name = job_name,
    use_spot_instances=use_spot_instances,
    max_run=max_run,
    max_wait=max_wait,
    checkpoint_s3_uri=checkpoint_s3_uri)

estimator.set_hyperparameters(max_depth=5,
                              objective="reg:squarederror",
                              eta=0.1,
                              num_round=150)

estimator.hyperparameters()


training_input_config = sagemaker.session.TrainingInput(
    s3_data=s3_training_file_location,
    content_type='csv',
    s3_data_type='S3Prefix')

validation_input_config = sagemaker.session.TrainingInput(
    s3_data=s3_validation_file_location,
    content_type='csv',
    s3_data_type='S3Prefix'
)

data_channels = {'train': training_input_config, 'validation': validation_input_config}

Enter fullscreen mode Exit fullscreen mode

Train and deploy

estimator.fit(data_channels)
predictor = estimator.deploy(initial_instance_count=1,                           instance_type='ml.m5.xlarge',endpoint_name = job_name)
Enter fullscreen mode Exit fullscreen mode

Run Predictions

from sagemaker.serializers import CSVSerializer
predictor.serializer = CSVSerializer()
predictor.predict(‘input’)
Enter fullscreen mode Exit fullscreen mode

Happy Machine learning coding

Discussion (0)