We've all seen things in movies we wish were real. The first thing that comes to my mind is The Matrix and the ability to download any skill onto a person's brain. While this is still far from possible, I've always wanted to create futuristic software I've seen in movies.
Kindly let me know in the comments: What's the first thing that comes to mind when you think of technology you've seen in movies or series?
Sadly, we are still far from creating an artificial intelligence as brilliant as the ones in Transcendence, Ex Machina or I, Robot. Even though I would love to start working on something that huge one day, I've also had this desire to build a live face recognition application for a while now.
I don't remember exactly when the idea came to mind; however, I know that face recognition is a pretty common thing in movies or series. So, I really wanted to start building one I could keep in my pocket.
So I decided to blog the process of me building one. There might be multiple stages. Small sneak peeks of possible future stages are collecting data on the go, setting up a smarter model, and updating and loading models online through dependency injection.
For now, we will be creating the code required for the application to detect the faces and communicate with the face recognition model.
Here's a quick demo video
I think I should mention that this works with multiple faces in the field of view of the camera at the same time. Sadly, at the time of recording the demo, I was alone.
Separating the Tasks
Before being able to start with the code, we need to envision how the application will be working.
This is the list of things we need to do, in the following order:
- Previewing camera frames
- Capturing camera frames
- Face detection
- Highlighting detected faces
- Face recognition
- Labeling highlighted faces
By building each part separately, we can, later on, improve certain aspects of the application without having to rewrite a lot of code.
Previewing Camera Frames
In order to display the camera frames to the user, I used AndroidX CameraView. This provided a simple view of what the camera sees. Also, it would be bound to the lifecycle of the activity/fragment. It was a better option than implementing my own preview because it resulted in much better FPS.
However, the camera view can be given different configurations and settings. A factory helped me specify which configuration I wanted to use, and it would get automatically applied to the camera view.
So, initializing the camera view looked something like this.
cameraView.apply {
bindToLifecycle(this@FaceRecognitionFragment)
CameraViewFactory.applyConfig(cameraView, cameraViewConfig)
}
Click here to check the Camera View Factory and Configs code
Capturing Camera Frames
The view was being shown thanks to the AndroidX CameraView. Yet, I still didn't have access to the frames, which I need to run the face detection model on.
Luckily, the AndroidX CameraX can analyze images. Following the CameraX Analyze images documentation, I created my own CameraAnalysisFactory
, which analyzes images according to certain configurations passed to the factory.
I had to decrease the resolution of the images because the models and the devices and the models we have currently are far from being able to handle high quality pictures fast.
Thus, in the camera analysis configuration, I specified a low resolution. While the model is seeing low resolution frames, the user is still seeing great resolution frames in the camera view.
object LowResFaceDetectionCameraAnalysisConfig : CameraAnalysisConfig {
override val resolution: Size = Size(240, 320)
override val readerMode: ImageAnalysis.ImageReaderMode = ImageAnalysis.ImageReaderMode.ACQUIRE_LATEST_IMAGE
}
I also created a lambda function that receives the camera's image and rotation degrees from the camera frame analyzer, which later on deals with the frame received by the analyzer. Then, initialized a camera analyzer by giving the factory a camera analysis configuration and said lambda function.
private val cameraFameAnalyzer: CameraFrameAnalyzerLambdaType = { imageProxy, rotation -> }
public val cameraFrameAnalysisConfig = LowResFaceDetectionCameraAnalysisConfig
public val cameraFrameAnalysis: ImageAnalysis = CameraAnalysisFactory.createCameraAnalysis(
cameraFrameAnalysisConfig,
analyzer = cameraFameAnalyzer
)
Now that I had the camera frame analysis object initialized, I had to bind it to our lifecycle, alongside with the camera view.
CameraX.bindToLifecycle(this, viewModel.cameraFrameAnalysis)
Click here to check the Camera Analysis Factory and Configs code
Face Detection
Since I just made a method to pass the frames from the camera to anything I wanted, I passed those frames onto the face detection model.
I chose Firebase MLKit for face detection since it is able to detect faces and track them as they move in the camera's field of view.
Accordingly, I created an abstract face detector object, with the necessary functionality. Meanwhile, the face detector objects inheriting that abstract class will have their own options for the firebase model.
Simply, I can, then, just initialize the face detector model I want to use.
private val faceDetector: FaceDetector = FaceDetector()
All that was left was to pass the frames I was getting from the analyzer onto the face detector. Following the Firebase MLKit Face Detection documentation, I specified the image's rotation and let the model process it.
private val cameraFameAnalyzer: CameraFrameAnalyzerLambdaType = { imageProxy, rotation ->
val mediaImage = imageProxy?.image
val imageRotation = when (rotation) {
0 -> FirebaseVisionImageMetadata.ROTATION_0
90 -> FirebaseVisionImageMetadata.ROTATION_90
180 -> FirebaseVisionImageMetadata.ROTATION_180
270 -> FirebaseVisionImageMetadata.ROTATION_270
else -> throw Exception("Rotation must be 0, 90, 180, or 270.")
}
mediaImage?.let {
faceDetector.processImage(mediaImage, imageRotation, { faces, frame -> })
}
}
Click here to check the Face Detector code
Highlighting Detected Faces
Since the face detector model returns the faces and the frame read by the images, I sent those faces and their coordinates in the frame to a face highlighter object. The face highlighter object would be responsible for drawing the highlights around the faces.
Therefore, I created a lambda function, similar to the one passed to the faceDetector.processImage
, where face highlight objects would be assigned to every detected face.
val highlightedFacesLiveData = MutableLiveData<List<FaceHighlight>>()
private val highlightDetectedFaces: (List<FirebaseVisionFace>) -> Unit = { faces ->
val highlightedFacesList = ArrayList<FaceHighlight>()
faces.forEach {
highlightedFacesList.add(RectangularFaceHighlight(it))
}
highlightedFacesLiveData.postValue(highlightedFacesList)
}
Then, I would specify that lambda function as the callback for the faceDetector.processImage
faceDetector.processImage(mediaImage, imageRotation, highlightDetectedFaces)
But I wasn't done yet. Before I drew those highlights on top of the camera view, I remembered that the camera view and the frames passed to the face detector don't have the same resolution. Therefore, I had to create a transforming object to transform the coordinates of the face detected and their sizes to match the resolution of the camera view.
So, it was important to attach the face highlighter to both the camera view and the camera analyzer. The face highlighter would read the size of the camera view and the resolution of the camera analyzer and transform the highlights accordingly.
private val highlighterFaces: FaceHighlighter = findViewById(R.id.highlighter_faces)
highlighterFaces.attachCameraView(cameraView, cameraViewConfig)
highlighterFaces.attachCameraAnalysisConfig(viewModel.cameraFrameAnalysisConfig)
Following that, I took the face highlights posted through LiveData
and drew them on the canvas.
viewModel.highlightedFacesLiveData.observe(this, Observer { highlightedFaces ->
highlighterFaces.clear()
highlightedFaces.forEach {
highlighterFaces.add(it)
}
highlighterFaces.postInvalidate()
})
Click here to check the Face Highlighter and the Face Highlight Objects
Face Recognition
The face recognition model was already done previously as a university course project using the sklearn.fetch_lfw_dataset dataset, you can check it on github, Oracle. This model will be later on rebuilt with VGGFace2 and improved even further.
Following the Firebase MLKit Custom Models documentation, I converted the model I had to a Tensorflow Lite model and created a face classifier object which can initialize that model and communicate with it.
I needed to give myself the option to switch the model easily later on. So, I created the models as a configuration class so the face classifier object can know the input shape, output shape, labels, and model path (whether local or remote). For now, the face classifier only supports local models.
Initializing the face classifier was simple, I just get its instance based on the configuration I want to use.
var faceClassifier: FaceClassifier = FaceClassifier.getInstance(context, OracleFaceClassifierConfig)
After the face classifier was initialized, I tried passing every frame's detected faces onto the face classifier, on a separate thread. This resulted in a lot of lag, which made the face classifier not usable.
In order to solve that issue, I decided to approach this differently. I ended up running the face classifier only when a new face enters the camera's field of view. The highlight would be drawn as soon as the face is detected. However, the classification would run on a separate thread as soon as the face is detected.
Accordingly, I had to update the highlightDetectedFaces
lambda function by keeping track of the trackingId
given from the firebase face detection model and running the face classification only on newly detected faces or trackingId
s.
private val highlightedFacesHashMap = HashMap<Int, Face>()
private val highlightDetectedFaces: onDetectLambdaType = { faces, frame ->
faces.forEach {
if (highlightedFacesHashMap.containsKey(it.trackingId)) {
highlightedFacesHashMap[it.trackingId]?.updateFace(it, frame)
} else {
val face = Face(it, frame)
highlightedFacesHashMap.put(it.trackingId, face)
viewModelScope.launch {
highlightedFacesHashMap[it.trackingId]?.apply { face.classifyFace(faceClassifier) }
}
}
}
val highlightedFacesList = ArrayList<FaceHighlight>()
highlightedFacesHashMap.values.forEach {
highlightedFacesList.add(LabeledRectangularFaceHighlight(it))
}
highlightedFacesLiveData.postValue(highlightedFacesList)
}
Click here to check the Face Classifier code
Labeling Highlighted Faces
I had just ran the face classification the way I wished. The result was coming up slightly delayed at first, but then was smooth while the user continued to move inside the camera's field of view. I just had to draw the label onto the screen for the user to see.
I already had a highlighter set, which can transform the highlights from small resolution frames to the resolution the view was using. I already had a rectangular highlight drawing around the person's face. And I was already getting the response from the model!
So I just created a highlight that inherits from my rectangular face highlight object. However, this new highlight checks if the face received by the model has a label or not. If the face does have a label, the label is drawn, and the highlights were being drawn every frame.
So, I created a new face object, instead of using the one provided by firebase. Simply, that face has a coroutine which is responsible for classifying the face. Once the result of that classification is out for that face, it is stored in that object. If the face is already in the camera's field of view, the face object doesn't change. So, I just wait for the classification to be done, and when it is, the face highlights drawn every frame would recognize that the face object being used has a label. Thus, it would draw the label under the face.
The face object's classification was shown in the previous code segment. But here it is again...
highlightedFacesHashMap[it.trackingId]?.apply { face.classifyFace(faceClassifier) }
Click here to check the Face object
It is important to note that the code shown in this post is segmented from a fragment, or view, and a viewmodel.
If you would like to fork or contribute, feel free to check out the repository.
Top comments (8)
Hello! I have difficulty using the sklearn.fetch_lfw_dataset dataset. Would you tell me how to add my image!
Hello! You'll have to add your image to the dictionary manually after fetching the dataset. You'll need to follow the same format as other images. So try to grab one image record and log it, and duplicate that with your own image.
I'm planning to post a machine learning blog post about this in more details in the future. :D
Thank you very much!
I apologize for the perhaps stupid question but where exactly do you add your data (image and text). In the main.ipynb or somewhere else!
No worries! :D You'll want to add the image and the text in the main.ipynb before the data is used to train the model. There should be multiple guides online about adding custom images to the sklearn dataset.
What's the first thing that comes to mind when you think of technology you've seen in movies or series?
hello and thank
i try your code in github but its not work because order folder is empty and i have no idea to what to do!
Hello! Can you share please the error you're running into? Or create an issue on the GitHub repo? It would be easier to understand where the problem is exactly :)