Organizations are becoming more data-driven no matter where your company is in the data analytics maturity cycle you may be thinking about your first use case. Whether you work for a startup, consulting, government or banking you may be helping your leaders make decisions to migrate a workload from on-prem to the cloud, doing some data engineering to build a pipeline, testing AI POCs or even making predictions from clean data in a data lake.
Today we will explore how you can get started easily in machine learning as a business user to test a hypothesis and confidently make decisions with no-code ML to empower you to get insights quickly without the need to have a computer science background.
You will learn how to:
Set up the environment to use machine learning
Select a target variable to build your model
Get started using Amazon SageMaker Canvas to make predictions
Retrieve model evaluation metrics, draw model insights and view predictions
In a nutshell machine learning helps you to explore patterns in your dataset. There are two types of machine learning which are:
Supervised Learning: Linear regression when you have a labelled dataset and you would like to make prediction from a dependent variable that is an integer or continuous variable. Classification is the opposite where you are trying to make predictions from an outcome variable that is a categorical variable.
Unsupervised Learning: Is when you have an unlabelled dataset and you would like to draw patterns from it. For example, clustering and princicpal component analysis.
You are welcome to explore courses in machine learning at AWS Skillbuilder
Amazon SageMaker Canvas empowers business analysts to make machine learning predictions with a user-friendly interface without the need to program or write any code. You may simply bring in your own dataset and use a pre-trained custom model in Amazon SageMaker Canvas.
Amazon SageMaker Canvas fosters the bridge between technical teams and the business sponsor for greater collaboration between data scientists, engineers and business analysts. This enables greater alignment in agile between business requirements and technical strategy to deliver end-user goals.
Anyone is welcome to use Amazon SageMaker Canvas especially if you are a BA, decision maker, executive leader and you don't have a machine learning background. You do not need to program in any language using ready to use ML models.
However if you are a data scientist, developer, machine learning engineer or solution architect familiar with programming, you may wish to use custom models.
As a business user your organization can explore use cases with Amazon SageMaker Canvas that include:
- Detect customer sentiment e.g. call transcripts
- Prediction e.g. customer churn and fraudulent applications
- Extract information from documents e.g. invoices, forms
- Classify images e.g. insurance claim assessments
- Classify text e.g. support tickets
- Demand forecasting
Let's start with the data analytics workflow using CRISP-DM to define the business problem:
- Is this insurance claim form fraudulent?
This vehicle insurance claim fraud detection dataset is from Kaggle.com.
This is a classification problem and the outcome variable i.e.FraudFound_P we are trying to predict, is a continuous variable with the binary values O = Not fraudulent and 1 = Fraudulent claim.
This is the solution overview of how to bring in your own dataset to make predictions with Amazon SageMaker canvas.
Step 1: Sign into your AWS account for IAM user. If you don't have an AWS IAM admin user account you may create one here.
Step 2: You may follow the getting started tutorial to create a stack using AWS Cloud Formation to set up the environment.
Accept the default settings for US East (N.Virginia) region and check the box to acknowledge the terms and conditions. Click Create stack.
- Step 3: The stack is being created and will take about 1 hour to complete.
- Step 1: Type into the search bar Amazon SageMaker.
On the left-handside, click Canvas and ensure that you are operating under AWS region US East (N.Virginia).
- Step 2: Open the Canvas to launch it.
It will take a few minutes to launch the canvas.
- Step 3: You may read the tutorials to help you get started.
Click Import to upload your dataset.
Click Amazon S3 to upload a dataset stored in Amazon S3 bucket.
Check the box against the file and click Import data.
- Step 1: Choose from Ready to use models
With ready to use models you may bring in your own dataset and take advantage of pre-trained models that use AWS services including Amazon Textract, Amazon Rekogntion and Amazon Comprehend to make predictions.
Step 2: If you cannot locate a ready to use model type the search bar 'Prediction' and select Create a Custom model.
- Step 3: Select Predictive Analysis and Create.
- Step 4: Click Select dataset to start building a model.
- Step 5: Under the tab Build, Choose a target variable from the drop-down menu as the outcome variable i.e. 'FraudFound_P' column.
The model type infers that this a two category prediction.
Select Quick Build.
It will take between 2-15 minutes to build the prediction model.
- Step 6: Under the Analyze tab in the section *Overview, Amazon SageMaker provides details that 96.402 % of the time the training results predicts fraud.
In the next tab Scoring, you can see the plot of the predicted and actual values. The model insights tell us that when an insurance claim is not fraudulent, the predicted outcome occurs 98.235 % of the time.
If you click Advanced Settings, you may view the Confusion Matrix which provides metrics to evaluate the model to predict the class = 0 (No fraud).
- F1 score = 98.083 %
- Accuracy = 96.402 %
- Precision = 98.230 %
- AUC (Area under the ROC curve) = 0.976
- Recall = 97.931 %
When we toggle to the class = 1 (Fraud), the evaluation metrics of the model include:
- F1 score = 70.712 %
- Accuracy = 96.072 %
- Precision = 69.072 %
- AUC (Area under the ROC curve) = 0.976
- Recall = 72.432 %
- Step 7: On the tab 'Predict' click the button Predict to make predictions. Select Batch predictions and select your dataset.
View and download the predictions.
Amazon SageMaker Canvas provides a list of predictions against the classes (Class 0 = no fraud and Class 1 = fraud) on the entire dataset which you may download.
- Step 8: You may also generate predictions by selecting Single predictions and Amazon SageMaker Canvas will provide details of the feature importance in descending order of the variables in the dataset.
You may also download the Acutal prediction which is:
- 95.31 % of the time there is no fraudulent insurance claim
- 4.69% of the time there is a fraudulent insurance claim
To avoid surprise charges on your AWS billing account at the end of the month, I recommend that you delete the AWS services that you no longer need.
- Step 1: Navigate to your Amazon S3 bucket and click Empty to clear the resource.
- Step 2: Under My Models click on the ellipse (3 dots) and delete the model.
- Step 3: Under User Details, for each application click Delete apps and enter the word 'delete' in the box.
And delete the user.
- Step 4: Navigate to Cloud Formation by typing the word into the search bar.
Click Stack on the left handside, select the radio-dial CFN-SM-IM-Lambda-Catalog and then click delete to disable all the AWS services that were previously created in that stack.
Confirm to delete the Cloudformation stack.
If you missed any of the keynotes or workshop sessions from AWS re:invent 2022 you may catch up and watch them on Youtube at this link.
You may watch the breakout session 'AWS re:Invent 2022 - Better decisions with no-code ML using SageMaker Canvas' featuring a customer story from Samsung to hear how they implemented Amazon SageMaker Canvas.
You may also hear about democratizing machine learning from the recent AWS re:Invent 2022 keynote from Dr Swami Sivasubramanian, VP of analytics, database and ML at AWS.
Amazon SageMaker Canvas also supports NLP and computer vision with ready to use models. You may find out more here.
Until the next lesson, Happy Learning! 😁
- AWS Sydney Summit - 4 April 2023. Register and join us for the live stream here.
- AWS re:Inforce - June 13-14 2023. Register and join us.
AWS London Summit - 7 June 2023. Register and join us.