Have you every tried to take a snapshot of a video and yet you get an image where everyone is looking glum?
Wouldn't it be great if somehow you could find the "smiliest" frame from a video and extract that for the snapshot?
Well, that is what I try to build in this video!
A 1080p version of the video is available on Cinnamon
This is a video taken from my weekly show "ML for Everyone" that broadcasts live on Twitch every Tuesday at 2pm UK time.
To detect the faces and the smiles we are using the open source computer vision library OpenCV.
Objective
The purpose of this code is to be used with Choirless a project I'm working on for Call for Code. We need to extract thumbnail images of the rendered choir in order to display in the UI. At the moment we are taking the first frame of the video, but that is generally not the best frame to use, and later frames with people smiling as they sing are better.
Process
Through the course of the coding session I developed a solution incrementally. First starting with something that just detected faces in the code, then developed that further to detect faces, then optimized it to only process key frames so as to run quicker. The main parts of the code are:
The Face detector
This code uses the detectMultiScale
method of the face classifier to first find faces in the frame. Once those are found, within each "region of interest", we look for smiles using the smile detector.
def detect(gray, frame):
# detect faces within the greyscale version of the frame
faces = face_cascade.detectMultiScale(gray, 1.1, 3)
num_smiles = 0
# For each face we find...
for (x, y, w, h) in faces:
if args.verbose: # draw rectangle if in verbose mode
cv2.rectangle(frame, (x, y), ((x + w), (y + h)), (255, 0, 0), 2)
# Calculate the "region of interest", ie the are of the frame
# containing the face
roi_gray = gray[y:y+h, x:x+w]
roi_color = frame[y:y+h, x:x+w]
# Within the grayscale ROI, look for smiles
smiles = smile_cascade.detectMultiScale(roi_gray, 1.05, 6)
# If we find smiles then increment our counter
if len(smiles):
num_smiles += 1
# If verbose, draw a rectangle on the image indicating where the smile was found
if args.verbose:
for (sx, sy, sw, sh) in smiles:
cv2.rectangle(roi_color, (sx, sy), ((sx + sw), (sy + sh)), (0, 0, 255), 2)
return num_smiles, frame
In the main body of the code we open the video file and loop through each frame of the video and pass it to the detect method above to count the number of smiles in that frame. We keep track of the "best" frame with the most smiles in:
# Keep track of best frame and the high water mark of
# smiles found in each frame
best_image = prev_frame
max_smiles = -1
while 1:
# Read each frame
ret, frame = cap.read()
# End of file, so break loop
if not ret:
break
# Calculate the difference of frame to previous one
diff = cv2.absdiff(frame, prev_frame)
non_zero_count = np.count_nonzero(diff)
# If not "different enough" then short circuit this loop
if non_zero_count < thresh:
prev_frame = frame
continue
# Convert the frame to grayscale
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# Call the detector function
num_smiles, image = detect(gray, frame.copy())
# Check if we have more smiles in this frame
# than out "best" frame
if num_smiles > max_smiles:
max_smiles = num_smiles
best_image = image
# If verbose then show the image to console
if args.verbose:
print(max_smiles)
cv2.imshow('Video', best_image)
cv2.waitKey(1)
prev_frame = frame
There is also an optimization step in which we for a preliminary loop to see how different each frame is from its predecessor. We can then calculate a threshold that will result in us only processing the 5% of the frames that have the most difference.
The full code is available on Github at: https://github.com/IBMDeveloperUK/ML-For-Everyone/tree/master/20200708-Smile-Detector-Using-OpenCV
Results
It works pretty well. If you run the code with the --verbose
flag then it will display each new "best" frame on the screen and the final output image will have rectangles drawn on so you can see what it detected:
% python smiler.py --verbose test.mp4 thumbnail-rectangles.jpg
Pre analysis stage
Threshold: 483384.6
Smile detection stage
5
6
7
8
9
Number of smiles found: 9
As you can see though, it wasn't perfect and it detected what it thought were smiles in some of the images of instruments.
Next Week I'll be trying a slightly different approach and training a Convolutional Neural Network (CNN) to detect happy faces. Find out more on the event page:
https://developer.ibm.com/events/training-a-convolutional-neural-network-cnn-to-detect-happy-faces/
I hope you enjoyed the video, if you want to catch them live, I stream each week at 2pm UK time on the IBM Developer Twitch channel:
Top comments (0)