Abstract
The paper addresses the problem of age drifting in face recognition. In this study, face recognition is addressed using Machine Learning models by identification with hyperparameters and models in binary format saved into database. Face recognition performed through face classification is applied over FaceNet and a bandwidth-limited neural network. The bandwidth-limited neural network accepts clustered faces through Face clustering which improves the accuracy. A Canny edge detector applied on face classification models improves the accuracy further. Age drifting is addressed by the bandwidth-limited neural network over grouping the faces by age.
1. Introduction
Face recognition faces with privacy issues due to which the regulatory bodies object them when they are deployed in sectors such as retail, payment or shops. The paper addresses the method of face recognition by face classification from embeddings. In order to improve the accuracy, one has to address the age drifting scenarios of faces. The age acts as a distribution and can be extracted from a face image and is also expected to be varying across age groups. The lower the age it is likely that the machine learning algorithm used for face classification faces difficulty to learn the new face. This is because lower age groups will contain more changes on facial characteristics and the face similarities within the faces of lower age groups are closer together. The algorithm learns the age distribution thereby increasing the overall accuracy of recognition.
Usually, face recognition is conducted using existing images of people directly such that a high accuracy is obtained. The popular face recognition deep neural network models, FaceNet and ArcFace achieves the state-of-the-art results on any face image. The obtained embeddings are compared against the embeddings from other stored images of the same person and other person to find the identity behind the given face image. The method to perform Face classification on the embeddings generated after face recognition involves comparing the results with stored hyperparameters and machine learning models stored in binary format.
With Machine Learning and Deep Learning, automation is feasible, enabling the recognizable faces that have already been registered in the system. The datasets used in the project have age, making them suitable for performing statistical analysis to explain the drift under a drift understanding context. The concept drift is detected by a change in the decision boundary, the data or both. The project addresses such a drawback while designing the Deep Neural Network (DNN) that produces an improved accuracy using a Voting Classifier compared to that of a FaceNet model for the same evaluation dataset. The study develops a Voting Classifier pipeline that processes face embeddings from two separate deep learning models’ pipelines involving a pre-trained FaceNet model and a custom model based on Convolutional Variational Autoencoders (CVAEs).
2. Related Work
The L2 distances of face embeddings obtained from Deep Learning neural networks are used to derive matching and non-matching scores of faces as per the paper [1]. This method is used for verification of an identity. The matching euclidean distances is normalised by non-matching mean and non-matching standard deviation to obtain the matching score and viceversa. This separates the non-matching and matching scores into two histograms. The drawback of this method is that privacy is lost due to direct access to images. The network, ArcFace, introduced by categorical cross entropy loss in the paper [2], performs face classification using face similarity analysis on a refined probe set. The paper achieves a classification accuracy of 97.91% on the refined probe set. ArcFace trains on AgeDB-30 dataset, Youtube Faces (YTF), MegaFace challenge and FaceScrub probe set. The network developed by Google, FaceNet, has about 22K parameters with an embedding vector size of 128. FaceNet performs face classification and verification using similar face similarity analysis techniques. The pose, illumination and expression invariant measure is explored by [3]. which uses a method of similarity scores described in [4]. A statistical test is used for the identification of similar and non-similar images. This is based on ranking of similarity scores where the sum of ranks of first 100 images indicate similar images. In a similarity between two sets of images, the order is determined by ranks of vectors in both sets. In another paper, [5], the method of classification via clustering is explained. The paper states that there is an improvement of accuracy due to clustering of unlabelled faces performed along with classification using Deep Neural Networks. Three methods are stated by the network: (1) Controlled disjoint, (2) Controlled overlap, (3) Semi-controlled. These form clustering techniques where the drawbacks of clustering are addressed in each method. The paper, [6], produces a learned representation of face images using a VAE (Variational Autoencoder) useful for face attribute prediction. Using a linear interpolation of the latent vector, the generated face images by the VAE draws out the images from a distribution and interpolates from source image to target image. The paper, [7], describes about Canny edge detector applied to face recognition algorithms using PCA (Principal Components Analysis). This method improves the accuracy by inference on images. This paper applies the same concept to face embeddings because privacy is enhanced to separate the complexities of Big Semantic Data Storage and Indexing from image data. This is proposed to be the solution where there are several Machine Learning models for data engineering instead of processing images directly for face recognition only.
3. Methodology
This section addresses the topics of bandwidth-limited neural network and canny edge detector applied on images. In training the embeddings from a neural network, a Voting Classifier is used with a schematic diagram, as shown below.
Figure 1. Schematic Diagram of Voting Classifier, with soft voting incorporated
3.1. Bandwidth-Limited Neural Network
A bandwidth limited neural network is an autoencoder that performs feature extraction by exposing the embeddings in their latent distribution. Since the target labels are attached to the variational autoencoder, in the latent distribution, they require some degree of clustering before classification to improve the accuracy score. The bandwidth on a neural network implies the network accepts only fewer number of identities per one inference time. Consider a group of images are compared using Euclidean distances (L2 distances), then one Euclidean distance matrix M x N will perform a comparison up to fewer number of identities within a neighborhood of faces. Such comparisons can occur between and across the neighborhoods through parallel processing. This is to ensure the resulting embeddings do not lose precision. The bandwidth is explained by the formula:
Bandwidth=(No of faces (N))/(Time (T))
Equation 1. Bandwidth as number of faces processed per unit time
Figure 2. Training by batch size where the identity and age are sequentially ordered
Face groupings are obtained through variety of techniques such as DBSCAN, KMeans, Fast-HAC, GCN, GCN-iter2 as mentioned in [5]. Such face groupings are added to the bandwidth limited neural network to improve the accuracy from the results obtained by randomized target vectors to that of ordered target vectors.
3.2 Canny edge detector
Once a convolutional neural network is applied on an image through a neural network, its layer information demonstrates edge like structures. This is due to a variety of filters getting applied while training the images. A Canny edge detector is one such technique to improve the sharpness of the edges within every layer of neural network. Consider an autoencoder is trained on faces, then the edges on the faces are improved, thereby, identification of faces will be better. The factor in the equation is taken to be 1000, suitably such that best results are obtained.
EdgeGradient(G)= √(G_x^2+G_y^2 )
Angle(θ)=tan^(-1)(G_x/G_y )
Initial Accuracy (Image+factor*Edge)=Improved Accuracy
Equation 2. Set of equations for applying Canny Edge Detector
Figure 3 Canny edge detector images improving inference output of images
4. Results
4.1. Bandwidth-Limited Neural Network
An accuracy improvement from 88.49% to 94% is obtained using such a network which trains the images by ordering by identity and age and training by randomized age values. The results are shown in the below table:
Another testing scenario was introduced to measure how effective the age-drifting was incorporated by the new model. This revealed interesting results on age-groups (younger and lower) by training younger and training elder.
The above results have been obtained through ordered set of random vectors. Another scenario is to obtain the results without applying face grouping such as a clustering algorithm before sending the target vectors. This method involves applying np.random.randint(0,435) where 435 represents the number of identities to the second parameter during the neural network inference. When trained and tested with randomized target vectors, the following results in model metrics were obtained.
4.2. Canny edge detector
The results given below show us the difference between a FaceNet network and a bandwidth-limited neural network which is taken as a Convolutional Variational Autoencoder (CVAE). The Canny detector is applied on all the four files: train_younger (facenet), train_younger (cvae), train_elder (facenet) and train_elder (cvae). Out of 122 false negatives registered for randomized target vectors, 25 true positives after applying canny prediction were registered. The results given below shows the improvement of accuracy through a Canny edge detector.
5. Discussion
A bandwidth limited neural network is useful to recognize faces within a neighbourhood. Usage of multiple models of these types enable identification of faces from a metadata of information using parallel computing. A huge accuracy improvement suggests such models ensure identification is performed on the embeddings instead on the images directly. By retraining the model with ordered age, the problem of age drifting is solved to some extent. It is up to the rest of the images that have an inherent noise in them, show drift in their target labels. The bandwidth neural network can be fine-tuned using the clustering algorithms listed in the Section 3 Methodology. It is found to show a variation from 93.57% to 97.06% for training elder scenario as opposed to 90.68% to 95.38% accuracy in training younger scenario. The Canny edge detector is found to be useful for applying a small improvement of accuracy to the existing network based on an image filter that filters out drifted images. It is observed that FaceNet responds to the Canny edge detector better than the CVAE.
6. Conclusion
Addressing age drifting has improved the face recognition system accuracy. A bandwidth limited neural network can be incorporated to organizations or neighborhoods that seeks to consider a service-oriented face recognition system which is limited by number of faces it can recognize. This would make regulations to face recognition more stricter enabling the individuals to control their privacy. A face recognition system with this approach contributes towards mobility enhancement that impacts hours of operations, reducing wait times and predicting more customer demand for transportation.
7. References
[1] P. Wang, Q. Ji and J. L. Wayman, "Modeling and Predicting Face Recognition System Performance Based on Analysis of Similarity Scores," IEEE Journals & Magazine, 2007.
[2] J. Deng, J. Guo, N. Xue and a. S. Zafeiriou, "ArcFace: Additive Angular Margin Loss for Deep Face Recognition," CVPR Open Access, p. 10, 2019.
[3] F. Schroff, T. Treibitz, D. Kriegman and a. S. Belongie, "Pose, illumination and expression invariant pairwise face-similarity measure via Doppelganger list comparison," 2011 International Conference on Computer Vision, p. pp. 2494–2501, 2011.
[4] L. Wolf, T. Hassner, Taigman and a. Y., "Similarity Scores Based on Background Samples," Computer Vision – ACCV, Vols. vol. 5995, H. Zha, R. Taniguchi, and S. Maybank, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 88–97, 2009.
[5] A. RoyChowdhury, X. Yu, K. Sohn, E. Learned-Miller and M. Chandraker, "Improving Face Recognition by Clustering Unlabeled Faces in the Wild," arXiV, 2020.
[6] X. Hou, L. Shen, K. Sun and G. Qiu, "Deep Feature Consistent Variational Autoencoder," IEEE Winter Conference on Applications of Computer Vision, 2017.
[7] R. Ullah, H. Hayat, A. A. Siddiqui, U. A. Siddiqui, J. Khan, F. Ullah, S. Hassan, L. Hasan, W. Albattah, M. Islam and a. G. M. Karami, "A Real-Time Framework for Human Face Detection and Recognition in CCTV Images," Hybrid Approaches for Image and Video Processing, 2022.
[8] F. Schroff, D. Kalenichenko and a. J. Philbin, "FaceNet: A Unified Embedding for Face Recognition and Clustering," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
[9] S. Dorodnicov, A. Grunnet-Jepsen, A. Puzhevich and D. Piro, "Open-Source Ethernet Networking for Intel® RealSense™ Depth Cameras," 2021. [Online]. Available: https://dev.intelrealsense.com/docs/open-source-ethernet-networking-for-intel-realsense-depth-cameras.
[10] S. A. Rizvi, P. J. Phillips and H. Moon, "The FERET Verification Testing Protocol for Face Recognition Algorithms," NISTIR 6281, 1998.
Top comments (0)