A vector database is a database that stores data as high-dimensional mathematical objects called vectors. We are going to see how this is useful in context with images from one of the projects I worked on.
In 2021 I worked with a team to build a face recognition system as an MVP. We used Facenet from Google. I’m not going bore you with the dataset used or the model training techniques of Facenet. We are going to see the overall architecture used to explain the necessity of a vector database.
This is the stack we used
- Front End - Flutter mobile app to get the information and images.
- Back End - FastAPI
- Database - MongoDB (this was the only option given to us)
There are two workflows one for registering users in the database and one for inference(recognition).
For registration, we get the details and images from the mobile app. The regular processing with the name, date of birth, … were done by another process. We get the image and feed it into the model; the model outputs a 128-dimensional vector/embedding. This vector is a unique signature of your face, like your fingerprint. We store this vector in the database.
For inference, we get the image from the mobile app and feed it into the network to generate the 128-dimensional vector. Now we compare this vector with all the vectors in the database using a similarity metric (euclidean distance or cosine distance).
Yes! you need to query every single vector in the database. If the database has millions of vectors stored, we cannot fit them in our memory, so we need to do some batch processing. This is super, super slow, consumes a ton of memory, in short, is not scalable.
After this, we choose the vector in the database with the maximum similarity score (say Person A’s vector), and we threshold it. For example, we use cosine similarity(range is from 0, meaning least similar, to 1 meaning most similar) with a threshold value of 0.4. If we get a score of 0.2(<0.4), which is very low, we can declare that the person is not in the system. If we get a score of 0.8(>0.4), we can declare that the person in the image is Person A.
This is known as zero-shot learning, we use the model for prediction without any specific training.
- As you know, neural networks are slow and need a lot of resources, which means you need to set up a system with expensive hardware, parallelize it, implement a queuing system, and more.
- During inference, we compare the thousands or maybe even millions of vectors. This means we must parallelize the operations and index the database to get the results fast.
Enter Vector databases! The figure below shows the rough idea of a vector database.
Basically all the processes are taken care by the vector database. We only need to receive the request from the front end, make corresponding operation with the database, get the results and forward it to the front end.
It is parallelized, indexed for faster inference, and it can be scaled relatively smoothly, solving the drawbacks we saw earlier. We can use different models to generate embeddings/vectors. Also, we can set the similarity metric to compare the vectors.
This is an overview of using vector databases in the context of images. Vector databases are used in many ML applications like semantic searching(NLP), etc… I Hope this article gave you an overall view and use case of vector databases.