What is a Convolutional Neural Network (CNN)? How can they be used to detect features in images? This is the video of a live coding session in which I show how to build a CNN in Python using Keras and extend the "smile detector" I built last week to use it.
A 1080p version of this video can be found on Cinnamon
A Convolutional Neural Network is a particular type of neural network that is very suited to analysing images. It works by passing a 'kernel' across the input image (convolution) to produce an output. These convolutional layers are stacked to produce a deep learning network and able to learn quite complex features in images.
In this session I coded a simple 3-layer CNN and trained it with manually classified images of faces.
Much of the code was based on the previous iteration of this. Subsequent to the live coding session, I actually refactored the code to use python generators to simplify the processing pipeline.
Frame Generator
This method opens the video file and iterates through the frames yielding each frame.
def frame_generator(self, video_fn):
cap = cv2.VideoCapture(video_fn)
while 1:
# Read each frame of the video
ret, frame = cap.read()
# End of file, so break loop
if not ret:
break
yield frame
cap.release()
Calculating the Threshold
Like in the previous session, we iterate through the frames to calculate the different between each frame and the previous one. It then returns the threshold needed in which to filter out just the top 5% of images:
def calc_threshold(self, frames, q=0.95):
prev_frame = next(frames)
counts = []
for frame in frames:
# Calculate the pixel difference between the current
# frame and the previous one
diff = cv2.absdiff(frame, prev_frame)
non_zero_count = np.count_nonzero(diff)
# Append the count to our list of counts
counts.append(non_zero_count)
prev_frame = frame
return int(np.quantile(counts, q))
Filtering the Image Stream
Another generator that takes in an iterable of the frames and a threshold and then yields each frame whose difference from the previous frame is above the supplied threshold.
def filter_frames(self, frames, threshold):
prev_frame = next(frames)
for frame in frames:
# Calculate the pixel difference between the current
# frame and the previous one
diff = cv2.absdiff(frame, prev_frame)
non_zero_count = np.count_nonzero(diff)
if non_zero_count > threshold:
yield frame
prev_frame = frame
Finding the Smiliest Image
By factoring out the methods above we can chain the generators together and pass them in to this method to actually look for the smiliest image. This means that (unlike the previous version) this method doesn't need to concern itself with deciding which frames to analyse.
We use the trained neural network (as a Tensorflow Lite model) to predict whether a face is smiling. Much of this structure is similar to last session in which we first scan the image to find faces. We then align each of those faces using a facial aligner -- this transforms the face such that the eyes are in the same location of each image. We pass each face into the neural network that gives us a score from 0
to 1.0
of how likely it is smiling. We sum all those values up in order to get an overall score of 'smiliness' for the frame.
def find_smiliest_frame(self, frames, callback=None):
# Allocate the tensors for Tensorflow lite
self.interpreter.allocate_tensors()
input_details = self.interpreter.get_input_details()
output_details = self.interpreter.get_output_details()
def detect(gray, frame):
# detect faces within the greyscale version of the frame
faces = self.detector(gray, 2)
smile_score = 0
# For each face we find...
for rect in faces:
(x, y, w, h) = rect_to_bb(rect)
face_orig = imutils.resize(frame[y:y + h, x:x + w], width=256)
# Align the face
face_aligned = self.face_aligner.align(frame, gray, rect)
# Resize the face to the size our neural network expects
face_aligned = face_aligned.reshape(1, 256, 256, 3)
# Scale to pixel values to 0..1
face_aligned = face_aligned.astype(np.float32) / 255.0
# Pass the face into the input tensor for the network
self.interpreter.set_tensor(input_details[0]['index'],
face_aligned)
# Actually run the neural network
self.interpreter.invoke()
# Extract the prediction from the output tensor
pred = self.interpreter.get_tensor(
output_details[0]['index'])[0][0]
# Keep a sum of all 'smiliness' scores
smile_score += pred
return smile_score, frame
best_smile_score = 0
best_frame = next(frames)
for frame in frames:
# Convert the frame to grayscale
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# Call the detector function
smile_score, frame = detect(gray, frame)
# Check if we have more smiles in this frame
# than out "best" frame
if smile_score > best_smile_score:
best_smile_score = smile_score
best_frame = frame
if callback is not None:
callback(best_frame, best_smile_score)
return best_smile_score, best_frame
We can then chain the functions together:
smiler = Smiler(landmarks_path, model_path)
fg = smiler.frame_generator(args.video_fn)
threshold = smiler.calc_threshold(fg, args.quantile)
fg = smiler.frame_generator(args.video_fn)
ffg = smiler.filter_frames(fg, threshold)
smile_score, image = smiler.find_smiliest_frame(ffg)
Output
Testing it out it all works pretty well, and finds a nice snapshot from the video of smiling faces.
The full code to this is now wrapped up as a complete Python package:
Smiler
This is a library and CLI tool to extract the "smiliest" of frame from a video of people.
It was developed as part of Choirless as part of IBM Call for code.
Installation
% pip install choirless_smiler
Usage
Simple usage:
% smiler video.mp4 snapshot.jpg
It will do a pre-scan to determine the 5% most changed frames from their previous frame in order to just consider them. If you know the threshold of change you want to use you can use that. e.g.
The first time smiler runs it will download facial landmark data and store it in ~/.smiler
location of this data and cache directory can be specified as arguments
% smiler video.mp4 snapshot.jpg --threshold 480000
Help
% smiler -h
usage: smiler [-h] [--verbose] [--threshold THRESHOLD]
[--landmarks-url LANDMARKS_URL] [--cache-dir CACHE_DIR]
[--quantile QUANTILE]
video_fn image_fn
Save thumbnail of smiliest frame in video
positional arguments:
video_fn filename for video to
…I hope you enjoyed the video, if you want to catch them live, I stream each week at 2pm UK time on the IBM Developer Twitch channel:
Top comments (1)
Very nice project :)