DEV Community

Lester Sim
Lester Sim

Posted on

AWS Machine Learning Certification: Exam Notes

Disclaimer: The opinions expressed here are my own and I'm not writing on behalf of AWS or Amazon.

The AWS Machine Learning - Specialty Certification covers a wide spectrum of topics from data engineering to exploratory data analysis to model training and deployment. Here are some quick notes I've gathered to prepare for the certification:

AWS AI Services

Beneficial for developers who want to add AI into their applications through API calls instead of developing and training their own ML models from scratch.

Amazon Textract

Extract text from scanned documents using Optical Character Recognition (OCR).

Documents

Returns text, forms, tables and query responses.

Image description

Expenses

Extracts data from invoices/receipts eg. vendor name, invoice/receipt date, invoice/receipt number, item name, item price, item quantity, total amount.

Image description

Amazon Comprehend

Extract entities, key phrases, language, personal identifiable information (PII), and sentiments from text.

Entities

Extract entities from text documents eg. people, places, locations.

Using AWS Console:
Image description
Using AWS API:
Image description

Key Phrases

Extract the key phrases (one or more words) from text documents.

Using AWS Console:
Image description
Using AWS API:
Image description

Sentiment

Predict the overall sentiment of the text - positive, negative, neutral, mixed.

Using AWS Console:
Image description

Using AWS API:
Image description

Language

Predict the dominant language of the entire text. Amazon Comprehend can recognize 100 languages.

Using AWS Console:
Image description

Using AWS API:
Image description

Personally Identifiable Information (PII)

List out entities in your input text that contain personal information eg. address, bank account number, or phone number.

Using AWS Console:
Image description

Using AWS API:
Image description

Vision

Amazon Rekognition

Analyze images and videos to identify objects, people, text, scenes, and activities.

Label Detection

Extract labels of objects, concepts, scenes, and actions in your images.

Image description

Facial Analysis

Detect faces and retrieve facial attributes in an image eg. facial expressions, accessories, facial features, etc.

Image description

Face Comparison

Compare faces within a set of images with multiple faces in them. Compares the largest face in the source image (reference face) with up to 100 faces detected in the target image (comparison faces), and generate a similarity score.

Image description

Other AWS AI Services

  • Amazon Lex: Build conversational interfaces using voice/text as input
  • Amazon Polly: Text to speech
  • Amazon Transcribe: Speech to text
  • Amazon Translate: To different languages

Domain 1: Data Engineering

AWS Glue

https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html

  • Serverless data integration service that makes it easy for analytics users to discover, prepare, move, and integrate data from multiple sources.
  • Data Sources: S3, RDS, JDBC, DynamoDB, Kinesis Data Streams, Apache Kafka
  • Data Targets: S3, RDS, JDBC
  • Crawlers: Automatically infer database and table schema from your source data, storing the associated metadata in the AWS Glue Data Catalog.
  • ETL Programming Languages: PySpark (Python), Scala
  • FindMatches Transform: Use this machine learning transformation step to identify duplicate or matching records. Eg. matching customers/products/improve fraud detection, etc.

Amazon Athena

https://docs.aws.amazon.com/athena/latest/ug/what-is.html

  • Serverless, interactive query service to query data and analyze big data in Amazon S3 using standard SQL.
  • Integration with AWS Glue: AWS Glue crawlers automatically infer database and table schema from data in S3 and store the associated metadata in AWS Glue Data Catalog. This catalog lets the Athena query engine know how to find, read, and process the data you want to query.
  • When to use Amazon Athena vs Redshift vs EMR: https://docs.aws.amazon.com/athena/latest/ug/when-should-i-use-ate.html

Amazon Kinesis

https://docs.aws.amazon.com/kinesis/index.html

Kinesis Video Stream

Stream live video data, optionally store it, and make the data available for consumption both in real time and on a batch or ad hoc basis.

Kinesis Data Stream

Collect and process large streams of data records in real time.

  • Reading from Data Streams (Consumers): Using Kinesis Data Analytics, Kinesis Data Firehose, Lambda, EC2

Kinesis Data Firehose

ETL service that captures, transforms, and delivers streaming data to data lakes, data stores, and analytics services. Buffers incoming streaming data to a certain size or for a certain period of time before delivering it to destinations.

  • Use Lambda to do data transformation for each buffered batch/convert file format. Eg. Apache Parquet more efficient to query than JSON format.
  • Delivery Stream Destination: S3, Redshift, Elasticsearch, Splunk, HTTP endpoint, etc

Kinesis Data Analytics

Continuously read and analyze data from a connected streaming source in real-time.

  • Source: Kinesis Data Stream, Kinesis Data Firehose
  • Destination: 1/ Kinesis Data Stream, 2/ Kinesis Data Firehose, 3/ Lambda
  • Runtime: SQL, Apache Flink
  • Aggregate/Analytical Functions: Hotspots, Random Cut Forest, etc

Domain 2: Exploratory Data Analysis

  • Data Labelling: AWS Ground Truth (Data labeling service using human annotators from Amazon Mechanical Turk or your own private workforce)
  • Feature Engineering: 1 hot encoding, binning, outliers, normalization, PCA dimension reduction. For text: TF-IDF, Bag of Words, N-Gram.
  • Know the different types of data visualization: Histogram, scatter plot, box plot, correlation heatmap, hierarchical plot, etc.

Domain 3: Modelling

https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html

  • Supervised Learning Algos: XGBoost, k-NN, Linear Learner, DeepAR Forecasting, Object2Vec,
  • Unsupervised Learning Algos: K-Means, PCA, Random Cut Forest
  • Text Analysis Algos: BlazingText, Sequence-to-Sequence, LDA, Neural Topic Model (NTM)
  • Image Processing Algos: MXNet, TensorFlow, Object Detection, Semantic Segmentation (pixel level)
  • Evaluation of ML Models: Confusion Matrix, AUC-ROC, Accuracy, Precision, Recall, F1 Score, RMSE
  • Overfitting Solutions: 1/ Use fewer features, 2/ Decrease n-grams size, 3/ Increase amount of regularization used, 4/ Increase amount of training data examples
  • Underfitting Solutions: 1/ Add new domain-specific features, 2/ Add more Cartesian products, 3/ Increase n-grams size, 4/ Decrease amount of regularization used, 5/ Increase amount of training data examples
  • Hyperparameter Tuning: Random Search, Bayesian Search
  • How SageMaker Studio works: https://aws.amazon.com/blogs/machine-learning/dive-deep-into-amazon-sagemaker-studio-notebook-architecture/
  • SageMaker Studio Notebooks vs SageMaker Notebook Instances: https://docs.aws.amazon.com/sagemaker/latest/dg/notebooks-comparison.html

Domain 4: Machine Learning Implementation & Operations

  • Real-time Inference: Create a HTTPS endpoint if you require a persistent endpoint for apps to call to get inferences
  • Batch Transform: Preprocess datasets, run inferences from large datasets, does not require a persistent endpoint.
  • SageMaker Neo: Automatically optimizes machine learning models for inference on cloud instances and edge devices to run faster with no loss in accuracy.
  • SageMaker Elastic Inference (EI): Speed up the throughput and decrease the latency of getting real-time inferences from your deep learning models that are deployed as SageMaker hosted models, but at a fraction of the cost of using a GPU instance for your endpoint
  • Track and monitor SageMaker metrics using: 1/ AWS Console, 2/ CloudWatch, 3/ SageMaker Python SDK APIs

This is only a brief summary of the core topics I found to be important and definitely not exhaustive. Please refer to https://aws.amazon.com/certification/certified-machine-learning-specialty/ for the full set of topics to prepare.

Top comments (0)