Human activity and emotion recognition from RGB videos using deep learning
Demo of the project
GitHub repo
In this Project, I have proposed a new methodology to recognize human
activities and emotions based on RGB videos, which take advantage of the
recent breakthrough made in the field of deep learning. This method uses an
image classification approach to recognize human activities and emotions.
We can divide this problem into two sub-problems: activity recognition, and
emotion recognition. We have used the transfer learning technique for both
problems. Human activity and emotion recognition gained popularity in
recent years because of the wide use of digital cameras in daily life and their
potential for human-computer interaction, and robotics applications.
In this work, a solution is proposed requiring only the use of RGB video
instead of RGB-D videos to recognize human activity and emotion. This
work shows a different approach based on the conversion of RGB video data
into 2D images and image classification. From a stream of RGB videos, a
two-dimensional skeleton of 17 joints for each detected body part is extracted
with a DNN-based human pose estimator called PoseNet. Then, skeleton
data are encoded into red, green, and blue channels of an image. A different
way of encoding data was studied and compared.
We used different state-of-the-art deep neural network architectures to classify human activities and compared them. Based on related works, we have
chosen to use image classification models: SqueezeNet, AlexNet, DenseNet,
ResNet, VGG, Inception, and retrained them to perform action recognition.
For all the experiments for activity recognition, the NTU RGB+D database
was used. The highest accuracy was obtained with ResNet 88.19%, which
outperformed all the previous works.
The second part of the problem is the detection of facial expressions from
RGB videos. Based on the previous study, we have used image classification
techniques based on deep learning. Before doing classification, we have
applied the OpenCV face detection function to recognize faces in the wild.
Cropped image of the face used as an input to our retrained VGG16 emotion
detection model based on deep neural network. The highest accuracy is
obtained with VGG16 (85.06%) which is comparable to any other state of the
art approaches.
Top comments (0)