DEV Community

‪Kareem Negm‬‏ for AWS MENA Community

Posted on

Amazon Machine Learning| ML Key Concepts

Amazon Machine Learning Key Concepts

Data sources

Term Definition
Attribute A unique, named property within an observation. In tabular-formatted data such as spreadsheets or CSV files
Datasource Name A unique name for a dataset
Input Data Collective name for all the observations that are referred to by a datasource.
Location Amazon ML can use data that is stored within Amazon S3 buckets, Amazon Redshift databases, or MySQL databases in Amazon Relational Database Service (RDS)
Observation A single data point that is part of a datasource
Schema The information needed to interpret the input data, including attribute names and their assigned data types, and names of special attributes.
Statistics Summary statistics for each attribute in the input data
Status Indicates the current state of the datasource, such as In Progress, Completed, or Failed.
Target Attribute the target attribute is the attribute whose value will be predicted by a trained ML model

ML Models

Term Definition
Regression ML model to predict a numeric value
Multiclass ML model to predict values that belong to a limited, pre-defined set of permissible values.
Binary ML model to predict values that can only have one of two state
Model Size ML models capture and store patterns. The more patterns a ML model stores, the bigger it will be. ML model size is described in Mbytes.
Number of Passes he number of times that you let Amazon ML use the same data records is called the number of passes.
Regularization Regularization is a machine learning technique that you can use to obtain higher-quality models


Term Definition
Model Insights Amazon ML provides you with a metric to evaluate the predictive performance of your model.
Precision the number of positive class predictions that actually belong to the positive class.
Recall the number of positive class predictions made out of all positive examples in the dataset.
AUC Area Under the ROC Curve (AUC) measures the ability of a binary ML model to predict a higher score for positive examples as compared to negative examples
Accuracy Accuracy measures the percentage of correct predictions.
F1-score The macro-averaged F1-score is used to evaluate the predictive performance of multiclass ML models.
RMSE The Root Mean Square Error (RMSE) is a metric used to evaluate the predictive performance of regression ML models.
Cut-off The cut-off is the threshold that you use to determine whether a predicted value is correct or not.

Batch Predictions

Term Definition
Output Location The results of a batch prediction are stored in an S3 bucket output location.
Manifest File This file relates each input data file with its associated batch prediction results. It is stored in the S3 bucket output location.

Real-time Predictions

Real-time predictions are for applications with a low latency requirement, such as interactive web, mobile, or desktop applications.

Term Definition
Real-time Prediction API The Real-time Prediction API accepts a single input observation in the request payload and returns the prediction in the response.
Real-time Prediction Endpoint To use an ML model with the real-time prediction API, you need to create a real-time prediction endpoint. Once created, the endpoint contains the URL that you can use to request real-time predictions.

AWS WhitePaper Summary

Top comments (0)