DEV Community

Cover image for Face Recognition on a Large Collection of Faces with Python

Posted on

Face Recognition on a Large Collection of Faces with Python

Lots of faces

Hey there!

Ever wondered how computers can recognize faces? Well, nowadays, it’s not as complicated as it used to be, thanks to the amazing advancements in computer vision. There are libraries like “face_recognition” and “deepface” that make face recognition tasks quite straightforward. You can easily recognize one or two faces, or even a hundred, with just a few lines of code. However, as you might expect, things get a bit tricky when you’re dealing with a large collection of faces. The more faces you have, the more time and effort it takes.

But fear not! In this article, we’re going to dive into how you can tackle this challenge and perform face recognition on a big bunch of faces.

Understanding Embeddings:

First things first, let’s talk about something called “embeddings.” Think of embeddings as unique signatures for each face. These are arrays of numbers that describe the essence of a face. To get these embeddings using Python’s “face_recognition” library, follow these steps:

import face_recognition

# Load the known image (e.g., Joe Biden's face)
known_image = face_recognition.load_image_file("biden.jpg")
biden_embeddings = face_recognition.face_encodings(known_image)[0]
Enter fullscreen mode Exit fullscreen mode

When you print out these embeddings, you’ll see an array of numbers, usually with a length of 128. Different deep-learning models might produce embeddings of different lengths.

Calculating Similarity:

Now, what’s the use of these embeddings? Well, they help us compare faces. Let’s say we have another face, and we want to see how similar it is to Joe Biden’s face. We can use mathematical measures like “cosine similarity” or “Euclidean distance” for this.

Here’s how you calculate cosine similarity:

from numpy import dot
from numpy.linalg import norm

def cosine_similarity(list_1, list_2):
  cos_sim = dot(list_1, list_2) / (norm(list_1) * norm(list_2))
  return cos_sim
Enter fullscreen mode Exit fullscreen mode

In simple terms, the closer the similarity score is to 1, the more alike the faces are. So, if you get a similarity score of 0.86, you can say these faces are about 86% similar.

Using Vector Databases:

But wait, when you have a ton of faces, calculating similarity for each pair of faces can be slow and memory-intensive. This is where “vector databases” come to the rescue. Think of a vector database as a smart way to store and quickly retrieve embeddings.

Let’s take “ChromaDB” as an example. Here’s how you can use it for your face recognition task:

First, create a collection to store your images:

import chromadb

# Choose where to store the database
client = chromadb.PersistentClient(path)

db = client.get_or_create_collection(
        "hnsw:space": 'cosine',
Enter fullscreen mode Exit fullscreen mode

Now, you can add your embeddings to the database:

    embeddings=[embeds],  # Replace with your embeddings
    metadatas=[{'name': 'Joe Biden'}]
Enter fullscreen mode Exit fullscreen mode

To search for similar faces in the database:

results = db.query(
    query_embeddings=[unknown_embeddings],  # Replace with your unknown embeddings
Enter fullscreen mode Exit fullscreen mode

The results will tell you which faces are similar and how close they are.

Understanding Distance Metrics:

I’ve done some experiments with different distance metrics for ChromaDB. Imagine the blue indicating that the faces match and the red meaning they don’t.

  • Cosine: Cosine similarity measures angles between vectors.


  • L2 (Euclidean): Euclidean distance measures straight-line distances between points.


Using the “facedb” Package:

To make your life easier, I’ve bundled all this functionality into a handy package called “facedb.” You can install it with a simple pip command:

pip install facedb
Enter fullscreen mode Exit fullscreen mode

Image description

Here’s how you can use it:

# Import the FaceDB library
from facedb import FaceDB

# Create a FaceDB instance and specify where to store the database
db = FaceDB(

# Add a new face to the database
face_id = db.add("Joe Biden", img="joe_biden.jpg")

# Recognize a face from a new image
result = db.recognize(img="new_face.jpg")

# Check if the recognized face matches the one in the database
if result and result["id"] == face_id:
    print("Recognized as Joe Biden")
    print("Unknown face")

Enter fullscreen mode Exit fullscreen mode

For More Use Cases:

If you’re interested in exploring more use cases and diving deeper into the code, you can check out the GitHub repository. There, you’ll find additional examples and resources to help you with your face recognition projects.

So, go ahead and give it a try! Goodbye, and I hope you find this information helpful for your face recognition projects!

Top comments (4)

cataua profile image
Rogério Caetano

Hi, Sifat! Great article! I started to study face and gesture recognition and your article is very interesting and helped me a lot. By the way, welcome to community and we hope you enjoy a lot and have good experiences here.

fadygrab profile image
Fady GA 😎

You abstracted away all the messy ML stuff and provided a simple solution! Great Stuff 👌

devsecbbs_dev profile image
DevSecBBS • Edited

how much processing power does it might take. can i run those with a ideal (for a single recognition in 1-3 second) time of loading in django shared hosting. may be on 1GB of ram on linux. if anybody has the experience?...

shhossain profile image

I think comparing faces can be done easily but the main problem will be when extracting embedding. You can use lighter model for that if current models don’t work or quantize those models for your use.