Have you every tried to take a snapshot of a video and yet you get an image where everyone is looking glum?
Wouldn't it be great if somehow you could find the "smiliest" frame from a video and extract that for the snapshot?
Well, that is what I try to build in this video!
To detect the faces and the smiles we are using the open source computer vision library OpenCV.
The purpose of this code is to be used with Choirless a project I'm working on for Call for Code. We need to extract thumbnail images of the rendered choir in order to display in the UI. At the moment we are taking the first frame of the video, but that is generally not the best frame to use, and later frames with people smiling as they sing are better.
Through the course of the coding session I developed a solution incrementally. First starting with something that just detected faces in the code, then developed that further to detect faces, then optimized it to only process key frames so as to run quicker. The main parts of the code are:
This code uses the
detectMultiScale method of the face classifier to first find faces in the frame. Once those are found, within each "region of interest", we look for smiles using the smile detector.
def detect(gray, frame): # detect faces within the greyscale version of the frame faces = face_cascade.detectMultiScale(gray, 1.1, 3) num_smiles = 0 # For each face we find... for (x, y, w, h) in faces: if args.verbose: # draw rectangle if in verbose mode cv2.rectangle(frame, (x, y), ((x + w), (y + h)), (255, 0, 0), 2) # Calculate the "region of interest", ie the are of the frame # containing the face roi_gray = gray[y:y+h, x:x+w] roi_color = frame[y:y+h, x:x+w] # Within the grayscale ROI, look for smiles smiles = smile_cascade.detectMultiScale(roi_gray, 1.05, 6) # If we find smiles then increment our counter if len(smiles): num_smiles += 1 # If verbose, draw a rectangle on the image indicating where the smile was found if args.verbose: for (sx, sy, sw, sh) in smiles: cv2.rectangle(roi_color, (sx, sy), ((sx + sw), (sy + sh)), (0, 0, 255), 2) return num_smiles, frame
In the main body of the code we open the video file and loop through each frame of the video and pass it to the detect method above to count the number of smiles in that frame. We keep track of the "best" frame with the most smiles in:
# Keep track of best frame and the high water mark of # smiles found in each frame best_image = prev_frame max_smiles = -1 while 1: # Read each frame ret, frame = cap.read() # End of file, so break loop if not ret: break # Calculate the difference of frame to previous one diff = cv2.absdiff(frame, prev_frame) non_zero_count = np.count_nonzero(diff) # If not "different enough" then short circuit this loop if non_zero_count < thresh: prev_frame = frame continue # Convert the frame to grayscale gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # Call the detector function num_smiles, image = detect(gray, frame.copy()) # Check if we have more smiles in this frame # than out "best" frame if num_smiles > max_smiles: max_smiles = num_smiles best_image = image # If verbose then show the image to console if args.verbose: print(max_smiles) cv2.imshow('Video', best_image) cv2.waitKey(1) prev_frame = frame
There is also an optimization step in which we for a preliminary loop to see how different each frame is from its predecessor. We can then calculate a threshold that will result in us only processing the 5% of the frames that have the most difference.
The full code is available on Github at: https://github.com/IBMDeveloperUK/ML-For-Everyone/tree/master/20200708-Smile-Detector-Using-OpenCV
It works pretty well. If you run the code with the
--verbose flag then it will display each new "best" frame on the screen and the final output image will have rectangles drawn on so you can see what it detected:
% python smiler.py --verbose test.mp4 thumbnail-rectangles.jpg Pre analysis stage Threshold: 483384.6 Smile detection stage 5 6 7 8 9 Number of smiles found: 9
As you can see though, it wasn't perfect and it detected what it thought were smiles in some of the images of instruments.
Next Week I'll be trying a slightly different approach and training a Convolutional Neural Network (CNN) to detect happy faces. Find out more on the event page:
I hope you enjoyed the video, if you want to catch them live, I stream each week at 2pm UK time on the IBM Developer Twitch channel: