Marisa You

Posted on Jan 22, 2021

Make a facial recognition program using Azure's Face client library

#ai

In the past few years, we've seen applications of facial recognition technology being integrated into our daily lives. The first practical application of facial recognition that I can think of is identity verification. For example, for iPhone owners (of iPhone X and more recent versions), gone are the days of entering passcodes, mis-typing them, and re-typing them. Now, iPhone users can simply pick up their phones and look at them to unlock their screens.

Other devices are starting to use facial recognition for identification as well. Microsoft Surface users can also simply sit in front of their cameras to access their contents. Aside from ID verification, facial recognition is also used for a few other cool (but arguably less useful) applications, such as Facebook identifying people in photos and suggesting who to tag.

Since facial recognition technology is being used more and more for useful applications and research is continuously being done to improve the technology, I thought it would be fun to play around with it a bit and determine which celebrity I look the most similar to using the Face client library from Azure Cognitive Services to build a simple facial recognition app.

Make an account with Azure

You will need to sign up for an Azure subscription to access the Face client library. Note that the first 12 months are free, but after that, you will have to pay for any services that you use. Of course, you can cancel your subscription at any time. Also note that, if you use the free version, Azure will limit the frequency of requests you send to it, so you'll need to add a lot of sleep time into your scripts.

Once you have an account, log in and create a Face resource. After you've created your resource, click on "Go to resource," where you will see a key and an endpoint, both of which you will need to use later.

The Face documentation does a much better job at detailing this process than I do, so feel free to use that as well.

Installation

For this project, I'm using Python. To install the Face client library:

pip install --upgrade azure-cognitiveservices-vision-face

Alternatively, you can use Go or C# as well.

Data Collection

The first step to building anything involving AI is data collection, and for this project, the data is in the form of images. Since I'm an Asian female, I thought that it would make the most sense to use images of Asian female celebrities. I used Playwright to scrape for photos of 41 female Kpop idols. For each image, I grabbed the source and wrote it to a text file along with a label to identify which celebrity the image belonged to in the format of:

[idol_name],[image_src]

For example, here's a line from the text file containing the data:

blackpink lisa,https://th.bing.com/th/id/OIP.ijTvepbl0DsSHIHwRTszqgHaLH?w=200&h=300&c=7&o=5&pid=1.7

To get started with using Playwright, you can reference my most recent blog post! Of course, you can use any other method that you prefer to collect your images as well.

Data Processing (generating faceId)

The Face client library generates a unique faceId for every face with a shell command in this format:

curl -H "Ocp-Apim-Subscription-Key: FACE_SUBSCRIPTION_KEY" "FACE_ENDPOINT/face/v1.0/detect?detectionModel=detection_02&returnFaceId=true&returnFaceLandmarks=false" -H "Content-Type: application/json" --data-ascii "{\"url\":\"IMG_SOURCE\"}"

Replace the FACE_SUBSCRIPTION_KEY and FACE_ENDPOINT placeholders with the key and endpoint from your resource in the shell command above. Also, replace IMG_SOURCE with an image source.

This, however, will only generate the faceId(s) for one image. You can write a script to generate the shell commands for every image and save the outputs to a shell script. This is the script that I used:

same = "curl -H \"FACE_SUBSCRIPTION_KEY\" \"FACE_ENDPOINT/face/v1.0/detect?detectionModel=detection_02&returnFaceId=true&returnFaceLandmarks=false\" -H \"Content-Type: application/json\" --data-ascii "

output = open('curl_script.sh', 'w+')
everyone = {}
for line in open('result.txt').readlines():
    person = line.split(',')[0]
    url = line.split(',')[1].replace('\n', '')

    output.write('printf "' + person + ',"\n')
    output.write(same)
    output.write('"{\\"url\\":\\"' + url + '\\"}" \n')
    output.write('printf "\\n"\n')
    output.write('sleep 5\n')

And the resulting shell script looks something like this (repeated for every image):

printf "blackpink lisa,"
curl -H \"FACE_SUBSCRIPTION_KEY\" \"FACE_ENDPOINT/face/v1.0/detect?detectionModel=detection_02&returnFaceId=true&returnFaceLandmarks=false\" -H \"Content-Type: application/json\" --data-ascii  "{\"url\":\"https://th.bing.com/th/id/OIP.ijTvepbl0DsSHIHwRTszqgHaLH?w=200&h=300&c=7&o=5&pid=1.7\"}"
printf "\n"

Then run the shell script and save the output to a text file. Each line in the text file will look something like:

blackpink lisa,[{"faceId":"FACE_ID","faceRectangle":{"top":49,"left":87,"width":48,"height":62}}]

and the faceId will be in the place of FACE_ID

Data Processing (saving images locally)

Aside from generating faceIds, you'll also need to save the images locally. To do this, I wrote another python script and saved the results in a shell file.

output = open('wget.sh', 'w+')
tally = {}
idx = -1
for line in open('result.txt').readlines():
    person = line.split(',')[0]
    if person in tally:
        tally[person] += 1
    else:
        tally[person] = 1

    idx = tally[person]

    url = line.split(',')[1]
    output.write('wget -O ./images/' + person.replace(' ', '_') + str(idx) + '.jpg ' + url.replace('\n', '') + '\n')

The resulting shell file looks like (again, repeated for every image):

wget -O ./images/blackpink_lisa1.jpg https://th.bing.com/th/id/OIP.ijTvepbl0DsSHIHwRTszqgHaLH?w=200&h=300&c=7&o=5&pid=1.7
wget -O ./images/blackpink_lisa1.jpg https://th.bing.com/th/id/OIP.ijTvepbl0DsSHIHwRTszqgHaLH?w=200&h=300&c=7&o=5&pid=1.7

And then, of course, run the shell script.

Now, all the data is in the correct formats, and we can proceed to training the facial recognition algorithm.

Training

Training the data is fairly straightforward if you follow the steps in the Face client library documentation. However, you might have to wait a while for it to complete due to all the sleep time. I had about 1,000 images in my training data set and plenty of sleep time, so I ran my training script overnight.

To get started with training the data, make a new python script and import some libraries. These are all the libraries that the Face documentation recommends importing:

import asyncio
import io
import glob
import os
import sys
import time
import uuid
import requests
from urllib.parse import urlparse
from io import BytesIO
# To install this module, run:
# python -m pip install Pillow
from PIL import Image, ImageDraw
from azure.cognitiveservices.vision.face import FaceClient
from msrest.authentication import CognitiveServicesCredentials
from azure.cognitiveservices.vision.face.models import TrainingStatusType, Person

I didn't actually need all of them, but you might if you're trying to build a different type of program. After importing, the rest of the script is as follows. Remember that a lot of wait time must be interspersed in it for the free service to work.

# grab your key and endpoint and create a FaceClient
KEY = FACE_SUBSCRIPTION_KEY
ENDPOINT = FACE_ENDPOINT

face_client = FaceClient(ENDPOINT, CognitiveServicesCredentials(KEY))

# create a PersonGroup
# the person group id must be unique, so add in some error handling
PERSON_GROUP_ID = str(uuid.uuid4())
try:
    face_client.person_group.create(person_group_id=PERSON_GROUP_ID, name=PERSON_GROUP_ID)
except:
    face_client.person_group.delete(person_group_id=PERSON_GROUP_ID)
    face_client.person_group.create(person_group_id=PERSON_GROUP_ID, name=PERSON_GROUP_ID)


# get list of all people from the text file containing the processed data
people = []
last = ""
for line in open('./faces-all.txt').readlines(): 
    person_str = line.split(",")[0].replace(" ", "_")
    if person_str != last:
        people.append(person_str)
    last = person_str

# write output to a text file 
output_face_id = open('face-ids.txt', 'w+')

# make face group for each person
everyone = {}
images = {}
for person in people:
    print('Make face group for ' + person)
    everyone[person] = face_client.person_group_person.create(PERSON_GROUP_ID, person)
    output_face_id.write(person + ',' + str(everyone[person].person_id)+'\n')
    time.sleep(5)
    images[person] = [filename for filename in glob.glob('./images/*.jpg') if person in filename]

# add images for each person to corresponding face group
# add in error handling for photos where a face cannot be detected
for person in images:
    for person_image in images[person]:
        image = open(person_image, 'r+b')
        print('Add an image to face group for ' + person)
        try :
            face_client.person_group_person.add_face_from_stream(PERSON_GROUP_ID, everyone[person].person_id, image)
        except Exception as e: 
            print(e)
            continue
        time.sleep(10)

# train the person group
print()
print('Training the person group...')
face_client.person_group.train(PERSON_GROUP_ID)

while (True):
    training_status = face_client.person_group.get_training_status(PERSON_GROUP_ID)
    print("Training status: {}.".format(training_status.status))
    print()
    if (training_status.status is TrainingStatusType.succeeded):
        break
    elif (training_status.status is TrainingStatusType.failed):
        sys.exit('Training the person group has failed.')
    time.sleep(5)

print('PERSON_GROUP_ID: ' + PERSON_GROUP_ID)
print('DONE TRAINING!!')

This script will output the person_id for each person in a text file, which will be used later in the test script. Additionally, the PERSON_GROUP_ID will be printed. Copy this and save it somewhere.

Testing

Add a few images of some of the people in your training set to another another directory in your working directory. These images should be images that are not already in your training set. Ensure that each image you select only has one face in it. Your completed program will try to determine if the faces in these images match anyone in the training set.

Start the testing script by importing the same libraries used for the training script or just the ones you will need. Then proceed with the rest of it.

# again, start by getting your key and endpoint and creating a FaceClient
KEY = FACE_SUBSCRIPTION_KEY
ENDPOINT = FACE_ENDPOINT

face_client = FaceClient(ENDPOINT, CognitiveServicesCredentials(KEY))

# get the person group id from your training script and put it here
PERSON_GROUP_ID = PERSON_GROUP_ID

# get the test images 
test_image_array = glob.glob('./test_images/*.jpg')

This next part shows how I tested 1 image, but if you'd like to test more than 1, simply stick this whole section into a for loop.

image = open(test_image_array[0], 'r+b')

# detect face in test image
face_ids = []
faces = face_client.face.detect_with_stream(image, detection_model='detection_03')
for face in faces:
    face_ids.append(face.face_id)

# get results
print('Identifying faces in {}'.format(os.path.basename(image.name)))

highest_confidence = 0.0
best_match = ""

for line in open('./face-ids.txt').readlines():
    person = str(line.split(',')[0].replace('_', ' '))
    person_id = str(line.split(',')[1]).replace('\n','')

    try:
        result = face_client.face.verify_face_to_person(face_ids[0], person_id, PERSON_GROUP_ID)
        print(person + ': ' + str(result.confidence))
        if result.confidence > highest_confidence:
            highest_confidence = result.confidence
            best_match = person
        time.sleep(2)
    except:
        print('Cannot find person_id for ' + person)
        time.sleep(2)

print('The person in the photo looks the most similar to ' + best_match + ' with a confidence of ' + str(highest_confidence))

To break down the code above, it's first detecting a face in the test image(s) and generating a face_id for it. Then, that face is being comparing to the person_ids, generated from the training file, for every person in the training group. The line

result = face_client.face.verify_face_to_person(face_ids[0], person_id, PERSON_GROUP_ID)

outputs something that looks like

{'additional_properties': {}, 'is_identical': False, 'confidence': 0.37697}

for each person, which you can print if you want to see it. This gives you a prediction of whether the person in your test image is identical to each person from your training set as well as a confidence score.

After running the test script, if the results were as you expected, then you're good to go.

Final Results

Now, it's finally time to determine who you look the most similar to from your training set!. For this, simply replace the test image with an image of yourself and run the test script again. When I did this myself, this is what I got:

You look the most similar to red velvet irene with a confidence of 0.41322

Pretty cool, right?

Just to make it clear, in no way am I saying that I actually look like her! The confidence score is not that high, which makes that clear. The Face client library has simply determined that, out of the people in my training group, I look the most similar to her.

Final Note

I ran into a lot of errors while I was writing the scripts. Most of the time, it was because I didn't have enough sleep time. If you get any errors and you can't figure out what's wrong, try putting in more sleep time and see if it helps!

This was just a project that I did to play around a bit with facial recognition. It's obviously not that useful but I had fun making it, and I think that it can be easily tweaked for other purposes. If you have some time on your hands and you're interested in facial recognition, give it a try!

DEV Community

Make a facial recognition program using Azure's Face client library

Make an account with Azure

Installation

Data Collection

Data Processing (generating faceId)

Data Processing (saving images locally)

Training

Testing

Final Results

Final Note

Top comments (0)

Read next

How to use Stable Diffusion 3 to generate a similar image.

Building a Better RAG: A Practical Guide to Two-Step Retrieval with LangChain

One LLaMa to rule them all

Modern Data Quality: Navigating the Landscape