Build a custom model for object and logo detection

#aws #deeplearning #machinelearning #serverless

Build a custom model for object and logo detection

Let’s have a quick look at AWS ML stack which offers almost all the ML aspects to solve our business problem.

What is Object detection?

It is a task to detect instance of certain classes from images or in video frame. Our aim is to find out what object is where in an image/video frame. It is the heart of other object dependent task like segmentation, image caption, object tracking etc.

It is quite important to understand the difference between Image recognition and Image detection. First one only cares about detecting the object while the later one aims to detecting the desired object location.

How does Object detection works?

Deep learning-based object detection models have two parts. An encoder which takes an image as input and runs it through a series of blocks(generally they are combination of convolution, BN and Max Pooling) and layers that learn to extract statistical features used to locate and label objects. Outputs from the encoder are then passed to a decoder, which predicts bounding boxes and labels for each object.

Object detectors output the location and label for each object, but how do we know how well the model is doing? For an object’s location, the most commonly-used metric is intersection-over-union (IOU). Given two bounding boxes, we compute the area of the intersection and divide by the area of the union. This value ranges from 0 (no interaction) to 1 (perfectly overlapping). For labels, a simple “percent correct” can be used. Here is an example which shows this.

Popular Models for Object Detection

Models uses either two stage or one stage object detection and it has been seen that one stage one is much faster. Two-stage detectors have high localization and object recognition accuracy, whereas the one-stage detectors achieve high inference speed.

I have shown here how from same input image, we can use deep learning models for classification, detection (one stage/two stage) or semantic segmentation ones.

Here is the list of most popular ones.

Why Rekognition

Object detection requires quite in-depth knowledge on Deep learning and network design. This is what AWS Rekognition does the heavy lifting and allows developer to build serverless application while AWS service is managing the deep learning aspects.

Amazon Rekognition Image offers APIs to detect objects and scenes, detect and analyze faces, recognize celebrities, detect inappropriate content, and search for similar faces in a collection of faces, along with APIs to manage resources.

Training for custom model

We can train a model by using the Amazon Rekognition Custom Labels console, or by the Amazon Rekognition Custom Labels API. You are charged for the amount of time that it takes to successfully train a model. Typically training takes from 30 minutes to 24 hours to complete.

Custom Labelling and why Rekognition

It is quite common request from customer to hear “Amazon Rekognition Labels is not specific enough for what I need for my business” and lets understand now how we AWS does the heavy lifting for such scenarios.

In order to handle above custom labelling request, we have two approach

DIY

Customization requires expertise and resources for managing

Deep Learning model training and fine-tune
Thousands of image to collect and labelling manually
- Error prone and subject which can lead to inconsistencies
Take weeks to complete labelling and model training

Amazon Rekognition Custom Labels

Rekognition Custom Labels builds off of Rekognition’s existing capabilities, which are already trained on tens of millions of images across many categories. Instead of thousands of images, we simply need to upload a small set of training images (typically a few hundred images or less) that are specific to our use case into our easy-to-use console.

Rekognition Custom Labels automatically loads and inspects the training data, selects the right machine learning algorithms, trains a model, and provides model performance metrics

It is Customized Image Analysis to easily detect objects and scenes you define as most relevant to our domain. Key advantages are

Guided Experience to create labeled images
Train and evaluate with no coding and no ML experience
Easy to use fully managed API

Amazon Rekognition Custom Labels provides a simple end-to-end experience where you start by labeling a dataset, and Amazon Rekognition Custom Labels builds a custom ML model for you by inspecting the data and selecting the right ML algorithm. After your model is trained, you can start using it immediately for image analysis. If you want to process images in batches (such as once a day or week, or at scheduled times during the day), you can provision your custom model at scheduled times.

Sample Serverless solution using Amazon Rekognition

Following architecture being taken from AWS github where it is being explain how we can build cost-optimal batch solution with Amazon Rekognition Custom Labels which provision custom model at scheduled times, process all our images, and then deprovision our resources to avoid incurring extra cost.

This application creates a serverless Amazon Rekognition Custom Label Detection workflow which runs on a pre-defined schedule (note that the schedule is enabled by default at deployment). It demonstrates the power of Step Functions to orchestrate Lambda functions and other AWS resources to form complex and robust workflows, coupled with event-driven development using Amazon EventBridge.

As an image is stored in Amazon S3 bucket, it triggers a message which gets stored in an Amazon SQS queue.
Amazon EventBridge is configured to trigger an AWS Step Function workflow at certain frequency (1 hour by default).
As the workflow runs it checks the number of items in the Amazon SQS queue. If there are no items to process in the queue, workflow ends. If there are items to process in the queue, workflow starts the Amazon Rekognition Custom Labels model and enables Amazon SQS integration with a Lambda function to process those images.
As integration between Amazon SQS queue and Lambda is enabled, Lambda start processing images using Amazon Rekognition Custom Labels.
Once all the images are processed, workflow stops the Amazon Rekognition Custom Labels model and disables integration between Amazon SQS queue and Lambda function

Key Points of Rekognition API

API operations don’t save any of the generated labels. Wecan save these labels by placing them in database, along with identifiers for the respective images.
Rekognition API for Video includes these features:
- Real-time analysis of streaming video;
- Person identification and pathing;
- Face recognition;
- Facial analysis;
- Objects, scenes and activities detection;
- Inappropriate video detection; and,
- Celebrity recognition.
Rekognition API for Images includes these features;
- Object and scene detection;
- Facial recognition;
- Facial analysis;
- Face comparison;
- Unsafe image detection;
- Celebrity recognition; and,
- Text in Image.

Code sample for Rekognition

This example displays the estimated age range and other attributes for detected faces, and lists the JSON for all detected facial attributes. Change the value of photo to the image file name. Change the value of bucket to the Amazon S3 bucket where the image is stored.

#Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#PDX-License-Identifier: MIT-0 (For details, see https://github.com/awsdocs/amazon-rekognition-developer-guide/blob/master/LICENSE-SAMPLECODE.)

import boto3
import json

def detect_faces(photo, bucket):

    client=boto3.client('rekognition')

    response = client.detect_faces(Image={'S3Object':{'Bucket':bucket,'Name':photo}},Attributes=['ALL'])

    print('Detected faces for ' + photo)    
    for faceDetail in response['FaceDetails']:
        print('The detected face is between ' + str(faceDetail['AgeRange']['Low']) 
              + ' and ' + str(faceDetail['AgeRange']['High']) + ' years old')

        print('Here are the other attributes:')
        print(json.dumps(faceDetail, indent=4, sort_keys=True))

        # Access predictions for individual face details and print them
        print("Gender: " + str(faceDetail['Gender']))
        print("Smile: " + str(faceDetail['Smile']))
        print("Eyeglasses: " + str(faceDetail['Eyeglasses']))
        print("Emotions: " + str(faceDetail['Emotions'][0]))

    return len(response['FaceDetails'])
def main():
    photo='photo'
    bucket='bucket'
    face_count=detect_faces(photo, bucket)
    print("Faces detected: " + str(face_count))


if __name__ == "__main__":
    main()