Learning AWS Rekognition with Celebrity Selfies

#aws #python #cloud

In this post we will explore AWS Rekognition, Amazon's cloud service for analyzing images and video. Rekognition has a number of different features, facial detection, text extraction, object labeling etc. For this post we will focus on Rekognition's ability to identify celebrities in images by using iconic celebrity selfies as our inputs. In this post you will:

Learn how to send an image to Rekognition with boto3
How to identify celebrities in an image
Annotate images with celebrity names
Draw bounding boxes using the Rekognition output
Utilize the Rekognition facial landmarks to enhance images

Let's get started!

Processing Images with Rekognition

AWS Rekognition input can be either an image stored in S3 or a bytes image. In this example we'll cover the case when your image is stored in a S3 bucket. I'll show how you can read the image from S3 with Rekognition and identify the celebrities in the image. The first selfie we'll look at is the iconic Beyonce/Jay-Z selfie.

Now let's see who Rekognition recognizes in this image!

If the image you want to process is loaded in S3 you can use boto3 to pass the image to Rekognition by specifying the name of the bucket and the key for the image.

import boto3

rek = boto3.client('rekognition')

response = rek.recognize_celebrities(
    Image={
        "S3Object": {
            "Bucket": "<YourBucketName>",
            "Name": "<ImageName>"
        }
    }
)

The recognize_celebrities function will take the input image and call the algorithm to identify celebrities in the image. Neat! Let's look at some of the data Rekognition provides about each image.

The response from recognize_celebrities includes a ton of data about the image. The boto3 documentation is phenomenal, so if you're interested in knowing about all the specifics I encourage you to read the boto3 Rekognition documentation for all the details about the recognize_celebrities response. We'll walk through a lot of this, but this is a good reference to have on hand!

Two pieces of output in the response are CelebrityFaces and UnrecognizedFaces, which lists all faces identified as celebrities in the image along with other faces in the image that were not identified as being a celebrity. In this first image let's summarize these fields to see how many identify celebrities and unrecognized faces were in the image.

face_counts = {k: len(v) for k, v in response.items() if k in ["CelebrityFaces", "UnrecognizedFaces"]}

Rekognition identified one face as a celebrity in the image and one other face was identified that couldn't be identified as a celebrity. So, who was identified?

Each item in the CelebrityFaces list will contain the name of the celebrity that was identified. Let's parse the output to see who was identified by Rekognition.

for celeb in response["CelebrityFaces"]:
    print(celeb["Name"])

The result is Beyonce! Rekognition identified that there was another face in the image, but wasn't able to specifically identify that face as that of Jay-Z.

Being able to name celebrities in an image is neat, but let's do a little more with Rekognition to annotate our celebrity selfies to more easily identify who's who.

Annotating Selfies

In the previous image we only had two faces to recognize, but what about the case when you have multiple recognized and unrecognized faces in an image? Identifying who is who can be difficult, but Rekognition can help with this process as well. It's impossible to have a post about celebrity selfies without a little Kardashian influence, so our next selfie is Kylie Jenner's selfie from the Met Gala.

What's fun about this image is the volume of recognizable faces that are included. To help identify whose who in the image we're going to utilize more of the Rekognition response. For each face, celebrity and unrecognized alike, Rekognition provides the coordinates framing each face in the image. Using those bounding box coordinates you can annotate the image to label names and indicate if a face is unknown. For Kylie's selfie we will annotate recognized celebrities with a green box labeled by their name and unrecognized faces will be framed by a grey bounding box.

In the previous example the image was loaded directly from S3. You can also process local images with Rekognition by converting the image into a bytesarray, which is shown below.

import io
import boto3
from PIL import Image

# Load image and convery to bytes array.
img = Image.open(open(img_path, 'rb'))
img_byte_arr = io.BytesIO()
img.save(img_byte_arr, format='PNG')

# Pass the byte array to Rekognition
rek = boto3.client("rekognition")
response = rek.recognize_celebrities(
    Image={
        "Bytes": img_byte_arr.getvalue()
    }
)

To draw the bounding box we need to access the BoundingBox coordinates associated with each face. The bounding box coordinates are provided as ratios to the overall image size, which will have to be scaled to the image's size. To do the scaling I created a function, calculate_bounding_coordinates, that will scale the Rekognition output to the image. We will use the PIL library for annotating the image. Drawing the image requires the Rekognition output as well as the original image, img, object.

For each face in CelebrityFaces and UnrecognizedFaces the appropriate bounding box will be drawn labeled with either the celebrity's name or a generic "Unknown" for the UnrecognizedFaces.

from PIL import Image, ImageDraw, ImageFont


# Prepare image for drawing.
img_width, img_height = img.size
draw = ImageDraw.Draw(img)
font = ImageFont.truetype('/Library/Fonts/Arial.ttf', 30)

def calculate_bounding_coordinates(img_width, img_height, box):
    """
    Calculates the location of the bounding box in the image.
    :param img_width: Width of the image.
    :param img_height: Height of the base image.
    :param box: Bounding box coordinates from Rekognition.
    :return: Tuple of points to frame the bounding box.
    """
    # Calculate box location.
    left = img_width * box['Left']
    top = img_height * box['Top']
    width = img_width * box['Width']
    height = img_height * box['Height']

    points = (
        (left, top),
        (left + width, top),
        (left + width, top + height),
        (left, top + height),
        (left, top)
    )

    return points

# Identified Celebrities will be drawn w a Green bounding box w their name.
for celeb in response["CelebrityFaces"]:
    box_coords = calculate_bounding_coordinates(img_width, img_height, celeb["Face"]["BoundingBox"])
    draw.line(box_coords, fill="#00d400", width=2)
    draw.text(box_coords[0], text=celeb["Name"], fill="#00d400", font=font)

# Unrecognized faces are framed with grey boxes.
for i, face in enumerate(response["UnrecognizedFaces"]):
    box_coords = calculate_bounding_coordinates(img_width, img_height, face["BoundingBox"])
    draw.line(box_coords, fill="#989695", width=2)
    draw.text(box_coords[0], text=f"Unknown{str(i)}", fill="#989695", font=font)

After applying the annotation the new img now looks like this:

It's so much easier to determine who's who!

Now let's go one level deeper with Rekognition's output!

Facial Features in AWS Rekognition

Rekognition doesn't stop with framing faces. For each face Rekognition will also provide details about individual facial features, eyes, mouth, nose, ears etc. To demonstrate these features we'll be adding black censor bars across the eyes of each face in the selfie.

To do this a couple more functions are needed. The first is get_facial_landmark_location that will retrieve the XY coordinates of a specific facial landmark (left and right eye in this case). The second function is calculate_eye_mask that will calculate the coordinates of the censor bar using the facial landmark locations.

Let's bring the censor bar together with the bounding box annotation from the previous selfie to fully annotate the image. And to demonstrate this let's use Ellen's famous Oscar selfie! The full code is as follows:

import io
import boto3
from PIL import Image, ImageDraw, ImageFont

# Load image and convery to bytes array.
img = Image.open(open(img_path, 'rb'))
img_byte_arr = io.BytesIO()
img.save(img_byte_arr, format='PNG')

# Pass the byte array to Rekognition
rek = boto3.client("rekognition")
response = rek.recognize_celebrities(
    Image={
        "Bytes": img_byte_arr.getvalue()
    }
)

# Prepare image for drawing.
img_width, img_height = img.size
draw = ImageDraw.Draw(img)
font = ImageFont.truetype('/Library/Fonts/Arial.ttf', 30)

def calculate_bounding_coordinates(img_width, img_height, box):
    """
    Calculates the location of the bounding box in the image.
    :param img_width: Width of the image.
    :param img_height: Height of the base image.
    :param box: Bounding box coordinates from Rekognition.
    :return: Tuple of points to frame the bounding box.
    """
    # Calculate box location.
    left = img_width * box['Left']
    top = img_height * box['Top']
    width = img_width * box['Width']
    height = img_height * box['Height']

    points = (
        (left, top),
        (left + width, top),
        (left + width, top + height),
        (left, top + height),
        (left, top)
    )

    return points


def get_facial_landmark_location(rek_obj, landmark):
    """
    Retrieves the XY coordinates for a specified facial landmark in the Rekognition response.

    :param rek_obj: A Rekognition response object.
    :param landmark: Name of the facial landmark to retrieve.
    :return: XY coordinates for the landmark.
    """
    facial_landmark = next((item for item in rek_obj["Landmarks"] if item["Type"] == landmark), None)

    return facial_landmark["X"], facial_landmark["Y"]


def calculate_eye_mask(image_width, image_height, rek_obj):
    """
    Calcualtes the coordinates for the eye mask given the location of the eyes.

    :param image_width: Width of the image.
    :param image_height: Height of the image.
    :param rek_obj:
    :return: Tuple of coordinates defining the location of the eye mask.
    """
    # Get eye landmarks
    eye_left = get_facial_landmark_location(rek_obj, "eyeLeft")
    eye_right = get_facial_landmark_location(rek_obj, "eyeRight")

    # Get the pixel location of the eyes in the image.
    eye_left_x = eye_left[0] * image_width
    eye_left_y = eye_left[1] * image_height
    eye_right_x = eye_right[0] * image_width
    eye_right_y = eye_right[1] * image_height

    # Build the coordinates for the rectangle to be drawn.
    poly_coords = (
        (eye_right_x + 50, eye_right_y - 20),
        (eye_left_x - 50, eye_left_y - 20),
        (eye_left_x - 50, eye_left_y + 20),
        (eye_right_x + 50, eye_right_y + 20),
    )

    return poly_coords


# Identified Celebrities will be drawn w a Green bounding box w their name.
for celeb in response["CelebrityFaces"]:
    box_coords = calculate_bounding_coordinates(img_width, img_height, celeb["Face"]["BoundingBox"])
    draw.line(box_coords, fill="#00d400", width=2)
    draw.text(box_coords[0], text=celeb["Name"], fill="#00d400", font=font)
    eye_coords = calculate_eye_mask(img_width, img_height, celeb["Face"])
    draw.polygon(eye_coords, fill="#000000")

# Unrecognized faces are framed with grey boxes.
for i, face in enumerate(response["UnrecognizedFaces"]):
    box_coords = calculate_bounding_coordinates(img_width, img_height, face["BoundingBox"])
    draw.line(box_coords, fill="#989695", width=2)
    draw.text(box_coords[0], text=f"Unknown{str(i)}", fill="#989695", font=font)
    eye_coords = calculate_eye_mask(img_width, img_height, face)
    draw.polygon(eye_coords, fill="#000000")

And the resulting image looks like this:

Wrapping Up

There you have it! In this tutorial you have:

Learned the two ways images can be used with AWS Rekognition
How to identify celebrities in an image
Draw bounding boxes using the Rekognition output
Utilize the Rekognition facial landmarks to enhance images

There is so much Rekognition can do! While the focus of this tutorial was on the recognize_celebrities function many of the same approaches can be used with the other Rekognition functions as well.

Thanks for reading! Have yourself a grand day!