DEV Community

Cover image for No code ML - Make accurate predictions with Amazon SageMaker Canvas
Wendy Wong for AWS Heroes

Posted on • Edited on

No code ML - Make accurate predictions with Amazon SageMaker Canvas

Empowering you to make decisions with no code ML

Organizations are becoming more data-driven no matter where your company is in the data analytics maturity cycle you may be thinking about your first use case. Whether you work for a startup, consulting, government or banking you may be helping your leaders make decisions to migrate a workload from on-prem to the cloud, doing some data engineering to build a pipeline, testing AI POCs or even making predictions from clean data in a data lake.

Today we will explore how you can get started easily in machine learning as a business user to test a hypothesis and confidently make decisions with no-code ML to empower you to get insights quickly without the need to have a computer science background.

Lesson Outcomes

You will learn how to:

  • Set up the environment to use machine learning

  • Select a target variable to build your model

  • Get started using Amazon SageMaker Canvas to make predictions

  • Retrieve model evaluation metrics, draw model insights and view predictions

What is Machine Learning?

In a nutshell machine learning helps you to explore patterns in your dataset. There are two types of machine learning which are:

Supervised Learning: Linear regression when you have a labelled dataset and you would like to make prediction from a dependent variable that is an integer or continuous variable. Classification is the opposite where you are trying to make predictions from an outcome variable that is a categorical variable.

Unsupervised Learning: Is when you have an unlabelled dataset and you would like to draw patterns from it. For example, clustering and princicpal component analysis.

You are welcome to explore courses in machine learning at AWS Skillbuilder

What is Amazon SageMaker Canvas?

Amazon SageMaker Canvas empowers business analysts to make machine learning predictions with a user-friendly interface without the need to program or write any code. You may simply bring in your own dataset and use a pre-trained custom model in Amazon SageMaker Canvas.

Why Amazon SageMaker Canvas?

Amazon SageMaker Canvas fosters the bridge between technical teams and the business sponsor for greater collaboration between data scientists, engineers and business analysts. This enables greater alignment in agile between business requirements and technical strategy to deliver end-user goals.

You may take advantage of AWS Free Tier for 2 months which includes 160 session hours per month free under AWS Free Tier with your AWS account.

Who should use it?

Anyone is welcome to use Amazon SageMaker Canvas especially if you are a BA, decision maker, executive leader and you don't have a machine learning background. You do not need to program in any language using ready to use ML models.

However if you are a data scientist, developer, machine learning engineer or solution architect familiar with programming, you may wish to use custom models.

What are the common use cases?

As a business user your organization can explore use cases with Amazon SageMaker Canvas that include:

  • Detect customer sentiment e.g. call transcripts
  • Prediction e.g. customer churn and fraudulent applications
  • Extract information from documents e.g. invoices, forms
  • Classify images e.g. insurance claim assessments
  • Classify text e.g. support tickets
  • Demand forecasting

Problem Statement

Let's start with the data analytics workflow using CRISP-DM to define the business problem:

  • Is this insurance claim form fraudulent?

Dataset

This vehicle insurance claim fraud detection dataset is from Kaggle.com.

This is a classification problem and the outcome variable i.e.FraudFound_P we are trying to predict, is a continuous variable with the binary values O = Not fraudulent and 1 = Fraudulent claim.

Solution Architecture

This is the solution overview of how to bring in your own dataset to make predictions with Amazon SageMaker canvas.

architecture

Tutorial 1: Setting up the environment

  • Step 1: Sign into your AWS account for IAM user. If you don't have an AWS IAM admin user account you may create one here.

  • Step 2: You may follow the getting started tutorial to create a stack using AWS Cloud Formation to set up the environment.

Accept the default settings for US East (N.Virginia) region and check the box to acknowledge the terms and conditions. Click Create stack.

create stack

  • Step 3: The stack is being created and will take about 1 hour to complete.

Being created

created

Tutorial 2: Get started as a business user

  • Step 1: Type into the search bar Amazon SageMaker.

On the left-handside, click Canvas and ensure that you are operating under AWS region US East (N.Virginia).

canvas

  • Step 2: Open the Canvas to launch it.

open

It will take a few minutes to launch the canvas.

message

  • Step 3: You may read the tutorials to help you get started.

turorials

  • Step 4: Upload your dataset into the Amazon S3 bucket.
    new

  • Step 5: Select Datasets.

daatsets

Click Import to upload your dataset.

Import data

Click Amazon S3 to upload a dataset stored in Amazon S3 bucket.

mys3

Check the box against the file and click Import data.

import daat

Tutorial 3: Start making predictions from your dataset

  • Step 1: Choose from Ready to use models

With ready to use models you may bring in your own dataset and take advantage of pre-trained models that use AWS services including Amazon Textract, Amazon Rekogntion and Amazon Comprehend to make predictions.

import

Step 2: If you cannot locate a ready to use model type the search bar 'Prediction' and select Create a Custom model.

custoim

  • Step 3: Select Predictive Analysis and Create.

predictn

  • Step 4: Click Select dataset to start building a model.

select

  • Step 5: Under the tab Build, Choose a target variable from the drop-down menu as the outcome variable i.e. 'FraudFound_P' column.

select target

fraud found p

The model type infers that this a two category prediction.
Select Quick Build.

quick build

It will take between 2-15 minutes to build the prediction model.

building

  • Step 6: Under the Analyze tab in the section *Overview, Amazon SageMaker provides details that 96.402 % of the time the training results predicts fraud.

model predicts

In the next tab Scoring, you can see the plot of the predicted and actual values. The model insights tell us that when an insurance claim is not fraudulent, the predicted outcome occurs 98.235 % of the time.

scoring

If you click Advanced Settings, you may view the Confusion Matrix which provides metrics to evaluate the model to predict the class = 0 (No fraud).

  • F1 score = 98.083 %
  • Accuracy = 96.402 %
  • Precision = 98.230 %
  • AUC (Area under the ROC curve) = 0.976
  • Recall = 97.931 %

confusion matrix

When we toggle to the class = 1 (Fraud), the evaluation metrics of the model include:

  • F1 score = 70.712 %
  • Accuracy = 96.072 %
  • Precision = 69.072 %
  • AUC (Area under the ROC curve) = 0.976
  • Recall = 72.432 %

with fraud

  • Step 7: On the tab 'Predict' click the button Predict to make predictions. Select Batch predictions and select your dataset.

View and download the predictions.

view predictions

Amazon SageMaker Canvas provides a list of predictions against the classes (Class 0 = no fraud and Class 1 = fraud) on the entire dataset which you may download.

Iprobability

  • Step 8: You may also generate predictions by selecting Single predictions and Amazon SageMaker Canvas will provide details of the feature importance in descending order of the variables in the dataset.

fearure importance

You may also download the Acutal prediction which is:

  • 95.31 % of the time there is no fraudulent insurance claim
  • 4.69% of the time there is a fraudulent insurance claim

Clean up resources

To avoid surprise charges on your AWS billing account at the end of the month, I recommend that you delete the AWS services that you no longer need.

  • Step 1: Navigate to your Amazon S3 bucket and click Empty to clear the resource.

click empty

  • Step 2: Under My Models click on the ellipse (3 dots) and delete the model.

delte tje model

  • Step 3: Under User Details, for each application click Delete apps and enter the word 'delete' in the box.

user

delete app

And delete the user.

delete user

  • Step 4: Navigate to Cloud Formation by typing the word into the search bar.

Click Stack on the left handside, select the radio-dial CFN-SM-IM-Lambda-Catalog and then click delete to disable all the AWS services that were previously created in that stack.

delete cloudformation

Confirm to delete the Cloudformation stack.

delete cl

AWS re:invent 2022 - Machine Learning

If you missed any of the keynotes or workshop sessions from AWS re:invent 2022 you may catch up and watch them on Youtube at this link.

You may watch the breakout session 'AWS re:Invent 2022 - Better decisions with no-code ML using SageMaker Canvas' featuring a customer story from Samsung to hear how they implemented Amazon SageMaker Canvas.

You may also hear about democratizing machine learning from the recent AWS re:Invent 2022 keynote from Dr Swami Sivasubramanian, VP of analytics, database and ML at AWS.

AWS SageMaker Canvas Announcement - 30 March 2023

Amazon SageMaker Canvas also supports NLP and computer vision with ready to use models. You may find out more here.

References

Until the next lesson, Happy Learning! 😁

What's new this week or coming soon to you 🌎

  • AWS Sydney Summit - 4 April 2023. Register and join us for the live stream here.

sydney

  • AWS re:Inforce - June 13-14 2023. Register and join us.

reinforce
AWS London Summit - 7 June 2023. Register and join us.

london

Top comments (0)