DEV Community

Cover image for Face Mask Detection With ResNet50 and SVM + Decision Tree
myxzlpltk
myxzlpltk

Posted on

Face Mask Detection With ResNet50 and SVM + Decision Tree

Face Mask Detection

Welcome, this post is a quick explanation on how I build mask detection using ResNet50 as feature extractor and then use Support Vector Machine (SVM) + Decision Tree with stacking ensemble method as classifier.

As tribute to fellow researcher, this app was based on research paper with title "A hybrid deep transfer learning model with machine learning methods for face mask detection in the era of the COVID-19 pandemic" written by Mohamed Loey, et al.

Table of contents:

Dataset Retrieval

This application uses a dataset from Kaggle. This dataset contains 853 images belonging to the 3 classes, as well as their bounding boxes in the PASCAL VOC format. The classes are with_mask, without_mask, and mask_weared_incorrect. For some reason, I only use the with_mask and without_mask labels. Check out this image sample below.

Sample 1 Dataset
Sample 2 Dataset

You can access this dataset via this url below.
https://www.kaggle.com/datasets/andrewmvd/face-mask-detection

Preprocessing

Preprocessing can be achieved by cropping face area based on bounding box information. First, read all xml file and image file from dataset folder.

import os

img_names = []
xml_names = []
for dirname, _, filenames in os.walk('./face-mask-detection'):
  for filename in filenames:
    if os.path.join(dirname, filename)[-3:] != "xml":
      img_names.append(filename)
    else:
      xml_names.append(filename)

print(len(img_names), "images")
Enter fullscreen mode Exit fullscreen mode

Then crop all images by its bounding box and read the label.

import xmltodict
from matplotlib import pyplot as plt
from skimage.io import imread

path_annotations = "face-mask-detection/annotations/"
path_images = "face-mask-detection/images/"

class_names = ['with_mask', 'without_mask']
images = []
target = []

def crop_bounding_box(img, bnd):
  x1, y1, x2, y2 = list(map(int, bnd.values()))
  _img = img.copy()
  _img = _img[y1:y2, x1:x2]
  _img = _img[:,:,:3]
  return _img

for img_name in img_names[:]:
  with open(path_annotations+img_name[:-4]+".xml") as fd:
    doc = xmltodict.parse(fd.read())

  img = imread(path_images+img_name)
  temp = doc["annotation"]["object"]
  if type(temp) == list:
    for i in range(len(temp)):
      if temp[i]["name"] not in class_names:
        continue
      images.append(crop_bounding_box(img, temp[i]["bndbox"]))
      target.append(temp[i]["name"])
  else:
    if temp["name"] not in class_names:
        continue
    images.append(crop_bounding_box(img, temp["bndbox"]))
    target.append(temp["name"])
Enter fullscreen mode Exit fullscreen mode

Based on labels, this dataset consists of 3232 with mask faces and 717 without mask faces.
Chart relationship between faces and label

This preprocessing also contains resize and normalization steps for ImageNet.

import torch

from torchvision import transforms

# Define preprocessing
preprocess = transforms.Compose([
  transforms.ToPILImage(),
  transforms.Resize((128, 128)),
  transforms.ToTensor(),
  transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
])

# Apply preprocess
image_tensor = torch.stack([preprocess(image) for image in images])
image_tensor.shape
Enter fullscreen mode Exit fullscreen mode

Feature Extraction

Feature extraction is needed to gather information from images using spatial operations to extract something that represents a label. In this application, I use ResNet50 as a feature extractor. The last layer of ResNet, which is a fully connected layer with 1.000 neurons, needs to be deleted.

from torchvision import models

# Download model
resnet = models.resnet50(pretrained=True)
resnet = torch.nn.Sequential(*(list(resnet.children())[:-1]))
Enter fullscreen mode Exit fullscreen mode

To freeze and keep the convolutional part of ResNet50 fixed, I need to set requires_grad to False.

for param in resnet.parameters():
    param.requires_grad = False
Enter fullscreen mode Exit fullscreen mode

I also need to call eval() to set ResNet50's batch normalization to disabled. Which will interfere with model accuracy and make sure ResNet50 only acts as a feature extractor.

resnet.eval()
Enter fullscreen mode Exit fullscreen mode

Last step apply ResNet50 to extract feature. Then ResNet will return a vector with 2048 features for each image.

import numpy as np

result = np.empty((len(image_tensor), 2048))
for i, data in enumerate(image_tensor):
  output = resnet(data.unsqueeze(0))
  output = torch.flatten(output, 1)
  result[i] = output[0].numpy()
Enter fullscreen mode Exit fullscreen mode

Split Dataset

To prevent the model from overfitting, I needed to split the data into 70% train data and 30% test data. Train data will be used to train the model and test data will be used to test or validate the model.

from sklearn.model_selection import train_test_split

X, y = result, np.array(target)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

print("Training data\n", np.asarray(np.unique(y_train, return_counts=True)).T)
print("Test data\n", np.asarray(np.unique(y_test, return_counts=True)).T)
Enter fullscreen mode Exit fullscreen mode

Define Model Classifier

As I have teased before, the proposed model is a stacking classifier (ensemble method) that will use SVM and decision tree as weak learners. Logistic regression will be the final estimator. In short definition, ensemble methods are techniques that create multiple models and then combine them to produce improved results. Ensemble methods usually produce more accurate solutions than a single model would.

from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression

clf = StackingClassifier(
    estimators=[('svm', SVC(random_state=42)),
                ('tree', DecisionTreeClassifier(random_state=42))],
    final_estimator=LogisticRegression(random_state=42),
    n_jobs=-1)
Enter fullscreen mode Exit fullscreen mode

Tuning Model

Tuning is the process of maximizing a model's performance without overfitting or creating too high of a variance. In machine learning, this is accomplished by selecting appropriate "hyperparameters". You can define your own tuning method what ever you want. But here is mine.

from sklearn.model_selection import GridSearchCV

param_grid = {
    'svm__C': [1.6, 1.7, 1.8],
    'svm__kernel': ['rbf'],
    'tree__criterion': ['entropy'],
    'tree__max_depth': [9, 10, 11],
    'final_estimator__C': [1.3, 1.4, 1.5]
}

grid = GridSearchCV(
    estimator=clf,
    param_grid=param_grid,
    scoring='accuracy',
    n_jobs=-1)

grid.fit(X_train, y_train)

print('Best parameters: %s' % grid.best_params_)
print('Accuracy: %.2f' % grid.best_score_)
Enter fullscreen mode Exit fullscreen mode

Based on the tuning process, the best hyperparameters are:

Best parameters: {'final_estimator__C': 1.3, 'svm__C': 1.6, 'svm__kernel': 'rbf', 'tree__criterion': 'entropy', 'tree__max_depth': 11}
Accuracy: 0.98
Enter fullscreen mode Exit fullscreen mode

Create Final Model

Finally, I can create a final model with the best hyperparameters. I hope this model will not overfit.

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

final_clf = StackingClassifier(
    estimators=[('svm', SVC(C=1.6, kernel='rbf', random_state=42)),
                ('tree', DecisionTreeClassifier(criterion='entropy', max_depth=11, random_state=42))],
    final_estimator=LogisticRegression(C=1.3, random_state=42),
    n_jobs=-1)

final_clf.fit(X_train, y_train)
y_pred = final_clf.predict(X_test)

print('Accuracy score : ', accuracy_score(y_test, y_pred))
print('Precision score : ', precision_score(y_test, y_pred, average='weighted'))
print('Recall score : ', recall_score(y_test, y_pred, average='weighted'))
print('F1 score : ', f1_score(y_test, y_pred, average='weighted'))
Enter fullscreen mode Exit fullscreen mode

Then I test the model with test data based on accuracy, precision, recall, and f1 score. The result are:

Accuracy score :  0.9721518987341772
Precision score :  0.9719379890530496
Recall score :  0.9721518987341772
F1 score :  0.9717932606523529
Enter fullscreen mode Exit fullscreen mode

Looks pretty good! Check out this confusion matrix. If it's biased, please comment 😁.
Confusion Matrix

Deploy Real App

This step is not required. But if you are interested, you must export the model first. Only the stacking classifier model, which was trained before. So you can load again in another program.

import pickle

pkl_filename = 'face_mask_detection.pkl'
with open(pkl_filename, 'wb') as file:
  pickle.dump(final_clf, file)
Enter fullscreen mode Exit fullscreen mode

This process might be simple, but first you need to check out this diagram below.
Flowchart Program

Important thing to remember is you need to implement your own face detection model and crop it. For my example of program, check out my Github Repository.

Top comments (0)