DEV Community

Xiao Ling
Xiao Ling

Posted on • Originally published at dynamsoft.com

How to Enhance Passport MRZ Detection in Python by Correcting Image Orientation

Passport Machine Readable Zone (MRZ) detection is sensitive to the orientation of the passport. If the passport is not right side up, the MRZ detection rate will be low. In this article, we will discuss how to improve the MRZ detection rate from rotated images with Python. Edge detection, perspective transformation and face detection will be used to correct the orientation of the passport.

Installation

pip install mrz-scanner-sdk document-scanner-sdk dlib mediapipe retina-face opencv-python
Enter fullscreen mode Exit fullscreen mode
  • mrz-scanner-sdk: Dynamsoft MRZ Scanner SDK for MRZ detection. A valid license key is required to use the SDK. You can get a free trial license from here.
  • document-scanner-sdk: Dynamsoft Document Scanner SDK for edge detection and perspective transformation.
  • dlib: An open-source software library that provides highly accurate and efficient face detection algorithm.
  • mediapipe: A Google-developed, open-source, cross-platform framework designed for rapid, real-time face detection.
  • retina-face: A deep learning based cutting-edge facial detector for Python coming with facial landmarks.
  • opencv-python: Used to display images and draw lines.

Passport Edge Detection and Perspective Transformation

Let's get started with a passport image taken in the correct orientation.

passport image

Using the following Python code can successfully detect the MRZ area:

import argparse
import mrzscanner
import sys
import numpy as np

def scanmrz():
    parser = argparse.ArgumentParser(description='Scan MRZ info from a given image')
    parser.add_argument('filename')
    parser.add_argument('-l', '--license', default='', type=str, help='Set a valid license key')
    args = parser.parse_args()
    try:
        filename = args.filename
        license = args.license

        # set license
        if  license == '':
            mrzscanner.initLicense("LICENSE-KEY")
        else:
            mrzscanner.initLicense(license)

        scanner = mrzscanner.createInstance()
        scanner.loadModel(mrzscanner.load_settings())

        import cv2
        image = cv2.imread(filename)
        results = scanner.decodeMat(image)
        for result in results:
            print(result.text)
            s += result.text + '\n'
            x1 = result.x1
            y1 = result.y1
            x2 = result.x2
            y2 = result.y2
            x3 = result.x3
            y3 = result.y3
            x4 = result.x4
            y4 = result.y4

            cv2.drawContours(image, [np.int0([(x1, y1), (x2, y2), (x3, y3), (x4, y4)])], 0, (0, 255, 0), 2)
            cv2.putText(image, result.text, (x1, y1), cv2.FONT_HERSHEY_SIMPLEX, 0.4, (0, 0, 255), 2)

        cv2.imshow("MRZ Detection", image)
        cv2.waitKey(0)

    except Exception as err:
        print(err)
        sys.exit(1)

if __name__ == "__main__":
    scanmrz()
Enter fullscreen mode Exit fullscreen mode

passport image

If the image is rotated at a significant angle, MRZ detection may fail.

rotated passport image

To address this issue, we can use edge detection and perspective transformation to correct the orientation of the passport.

rotated passport mrz detection

Here are the steps:

  1. Initialize the document scanner:

    import docscanner
    doc_scanner = docscanner.createInstance()
    doc_scanner.setParameters(docscanner.Templates.color)
    
  2. Detect the edges of the passport:

    results = doc_scanner.detectMat(image)
    result = results[0]
    x1 = result.x1
    y1 = result.y1
    x2 = result.x2
    y2 = result.y2
    x3 = result.x3
    y3 = result.y3
    x4 = result.x4
    y4 = result.y4
    
  3. Rectify the passport image:

    rectified_document = doc_scanner.normalizeBuffer(image, x1, y1, x2, y2, x3, y3, x4, y4)
    rectified_document = docscanner.convertNormalizedImage2Mat(rectified_document)
    
  4. Detect the MRZ area from the rectified passport image:

    def detect_mrz(image, scanner):
        s = ""
        results = scanner.decodeMat(image)
        for result in results:
            # print(result.text)
            s += result.text + '\n'
            x1 = result.x1
            y1 = result.y1
            x2 = result.x2
            y2 = result.y2
            x3 = result.x3
            y3 = result.y3
            x4 = result.x4
            y4 = result.y4
    
            cv2.drawContours(image, [np.int0([(x1, y1), (x2, y2), (x3, y3), (x4, y4)])], 0, (0, 255, 0), 2)
            cv2.putText(image, result.text, (x1, y1), cv2.FONT_HERSHEY_SIMPLEX, 0.4, (0, 0, 255), 2)
    
        cv2.imshow("MRZ Detection", image)
    
    detect_mrz(rectified_document, mrz_scanner)
    

Rotating Images Based on Facial Orientation

After perspective transformation, the image may be oriented in one of four directions: 0 degrees, 90 degrees, 180 degrees, or 270 degrees.

rotated passport

If you run the code above, you will find only the 0-degree orientation allows for normal MRZ detection. Thus, we aim to rotate the other three orientations to this correct angle. Considering that the orientation of the face on the passport is consistent with that of the Machine-Readable Zone, we can use face detection to rotate the image accordingly.

Numerous face detection algorithms exist, each with varying levels of performance. In this article, we will compare the effectiveness of three prominent algorithms: Dlib, MediaPipe, and RetinaFace.

Dlib

  1. Download the pre-trained model from here.
  2. Unzip the file and put it in the same folder as the Python script.
  3. Create the Dlib face detector:

    import dlib
    detector = dlib.get_frontal_face_detector()
    predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")
    
  4. Detect the faces from the rectified passport image:

    mg = cv2.imread(filename)  
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    
    start_time = time.time()
    faces = detector(gray)
    end_time = time.time()
    print("Elapsed Time:", end_time - start_time)
    

dlib face detection

The dlib face detection algorithm is typically trained on datasets where faces are upright or near-upright. The features learned by the classifier assume that the faces in the images will be oriented in a specific way, usually right side up. When a face is rotated significantly (like upside-down or tilted at 90 degrees), the learned features may not match well, making it difficult for the algorithm to detect the face.

Mediapipe

  1. Download the pre-trained model from here. At present, only BlazeFace (short-range) is available, which is a lightweight model for detecting single or multiple faces.

  2. Put the model in the same folder as the Python script.

  3. Create the MediaPipe face detector:

    import mediapipe as mp
    from mediapipe.tasks import python
    from mediapipe.tasks.python import vision
    
    mp_face_detection = mp.solutions.face_detection
    mp_drawing = mp.solutions.drawing_utils
    
    base_options = python.BaseOptions(model_asset_path='blaze_face_short_range.tflite')
    options = vision.FaceDetectorOptions(base_options=base_options)
    detector = vision.FaceDetector.create_from_options(options)
    
  4. Detect the faces from the rectified passport image:

    img = cv2.imread(filename)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    image = mp.Image(image_format=mp.ImageFormat.SRGB, data=img)
    
    start_time = time.time()
    detection_result = detector.detect(image)
    end_time = time.time()
    print("Elapsed Time:", end_time - start_time)
    

mediapipe face detection

Compared to Dlib, Mediapipe is faster and more accurate. However, it still falls short of our requirements because it fails to detect some facial landmarks correctly.

Retinaface

RetinaFace is a deep learning-based face detection model aimed at identifying faces in images with high accuracy. Let's explore whether it meets our objectives.

from retinaface import RetinaFace
img = cv2.imread(filename)
obj = RetinaFace.detect_faces(img_path=img)

if type(obj) == dict:
    for key in obj:
        identity = obj[key]

        facial_area = identity["facial_area"]
        facial_img = img[facial_area[1]: facial_area[3],
                            facial_area[0]: facial_area[2]]

        landmarks = identity["landmarks"]
        left_eye = landmarks["left_eye"]
        right_eye = landmarks["right_eye"]
        nose = landmarks["nose"]
        mouth_right = landmarks["mouth_right"]
        mouth_left = landmarks["mouth_left"]

        cv2.rectangle(img, (facial_area[0], facial_area[1]),
                        (facial_area[2], facial_area[3]), (0, 255, 0), 2)
        cv2.circle(img, (int(left_eye[0]), int(
            left_eye[1])), 2, (255, 0, 0), 2)
        cv2.circle(img, (int(right_eye[0]), int(
            right_eye[1])), 2, (0, 0, 255), 2)
        cv2.circle(img, (int(nose[0]), int(nose[1])), 2, (0, 255, 0), 2)
        cv2.circle(img, (int(mouth_left[0]), int(
            mouth_left[1])), 2, (0, 155, 255), 2)
        cv2.circle(img, (int(mouth_right[0]), int(
            mouth_right[1])), 2, (0, 155, 255), 2)

cv2.imshow(filename, img)
Enter fullscreen mode Exit fullscreen mode

retina face detection

RetinaFace takes the longest time for face detection, but it is the most accurate. It correctly identifies facial landmarks in all four directions, which we can use to rotate the image.

def rotate(img, left_eye, right_eye, nose):

    nose_x, nose_y = nose
    left_eye_x, left_eye_y = left_eye
    right_eye_x, right_eye_y = right_eye

    if (nose_y > left_eye_y) and (nose_y > right_eye_y):
        return img # no need to rotate
    elif (nose_y < left_eye_y) and (nose_y < right_eye_y):
        return cv2.flip(img, flipCode=-1) # 180 degrees
    elif (nose_x < left_eye_x) and (nose_x < right_eye_x):
        transposed = cv2.transpose(img)
        return cv2.flip(transposed, flipCode=0) # 90 degrees 
    else:
        transposed = cv2.transpose(img)
        return cv2.flip(transposed, flipCode=1) # 270 degrees 
Enter fullscreen mode Exit fullscreen mode

Combining Document Detection and Retina Face Detection for MRZ Detection

We can now combine the above steps to detect the MRZ area in rotated passport images.

import argparse
import mrzscanner
import sys
import numpy as np
import cv2
import docscanner
import time
import face_retina

def detect_mrz(image, scanner):
    ...

def detect_doc(image, scanner):
    ...

    return mat

def scanmrz():
    parser = argparse.ArgumentParser(description='Scan MRZ info from a given image')
    parser.add_argument('filename')
    parser.add_argument('-l', '--license', default='', type=str, help='Set a valid license key')
    args = parser.parse_args()
    try:
        filename = args.filename
        license = args.license

        # set license
        if  license == '':
            defaultLicense = "LICENSE-KEY"
            mrzscanner.initLicense(defaultLicense)
            docscanner.initLicense(defaultLicense)
        else:
            mrzscanner.initLicense(license)
            docscanner.initLicense(license)

        mrz_scanner = mrzscanner.createInstance()
        mrz_scanner.loadModel(mrzscanner.load_settings())
        doc_scanner = docscanner.createInstance()
        doc_scanner.setParameters(docscanner.Templates.color)

        image = cv2.imread(filename)
        copy = image.copy()
        copy = detect_doc(copy, doc_scanner)
        copy = face_retina.detect(copy)
        detect_mrz(copy, mrz_scanner)

        cv2.imshow("Original", image)


    except Exception as err:
        print(err)
        sys.exit(1)

if __name__ == "__main__":
    scanmrz()
    cv2.waitKey(0)
Enter fullscreen mode Exit fullscreen mode

passport mrz detection in any orientation

Source Code

https://github.com/yushulx/python-mrz-scanner-sdk/tree/main/examples/enhanced

Top comments (0)