DEV Community

loading...

Social Distancing Analyzer using OpenCV and YOLO

sherwyn11 profile image Sherwyn D'souza Updated on ・8 min read

Alt Text

Introduction

Social distancing is deliberately increasing the physical space between people to avoid spreading illness. Staying at least six feet away from other people lessens your chances of contracting COVID-19. We can use OpenCV and YOLO to monitor/analyze whether people are maintaining social distancing or not.

Techniques and tools used

I used Python for this project. Some other tools I used were OpenCV and NumPy.

Theory

A little theory won’t hurt :)

OpenCV

So, if you don’t know what OpenCV is, OpenCV is a library of programming functions mainly aimed at real-time computer vision. OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. OpenCV was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in the commercial products. Being a BSD-licensed product, OpenCV makes it easy for businesses to utilize and modify the code.
The library has more than 2500 optimized algorithms, which includes a comprehensive set of both classic and state-of-the-art computer vision and machine learning algorithms.
For more info Click Here.

YOLO

YOLO(You Only Look Once) is a clever convolutional neural network (CNN) for doing object detection in real-time. The algorithm applies a single neural network to the full image, and then divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities.
YOLO is popular because it achieves high accuracy while also being able to run in real-time. The algorithm “only looks once” at the image in the sense that it requires only one forward propagation pass through the neural network to make predictions. After non-max suppression (which makes sure the object detection algorithm only detects each object once), it then outputs recognized objects together with the bounding boxes.
We will use these 2 libraries in our project extensively.

Overview

We will use YOLO for object detection.
Once the objects(people) are detected, we will then draw a bounding box around them.
Using the centroid of the boxes we then measure the distances between them.
For the distance measure, Euclidean Distance was used.
A box is colored RED if unsafe and GREEN if safe.
We will also count the number of people who are unsafe because they are not maintaining social-distancing.
Already interested? Let’s gets started with the fun part…
Alt Text

Project

First, let’s see the project structure
Alt Text

Now for the video.mp4 file(input) Click here. Also you can download the YOLOv3 weights, configuration and COCO names from here:
YOLOv3 weights — Click here
YOLOv3 cfg — Click here
COCO names — Click here

Now after that is done, open up the constants.py and copy the following lines of code

YOLOV3_LABELS_PATH = './yolov3/coco.names'
YOLOV3_CFG_PATH = './yolov3/yolov3.cfg'
YOLOV3_WEIGHTS_PATH = './yolov3/yolov3.weights'
VIDEO_PATH = './videos/video.mp4'
OUTPUT_PATH = './output/output.avi'
SAFE_DISTANCE = 60
Enter fullscreen mode Exit fullscreen mode

Wait… What did I just copy?
Don’t Worry! This file just contains the absolute paths of the YOLO weights, cfg file, COCO names, input video path, output video path and the SAFE DISTANCE to be maintained.

Now onto the main part. Open up the main.py file. First let’s make the necessary imports. We also define 2 more constants LABELS and COLORS which we will be using later.

import numpy as np
import imutils
import time
import cv2
import os
import math

from itertools import chain 
from constants import *

LABELS = open(YOLOV3_LABELS_PATH).read().strip().split('\n')

np.random.seed(42)
COLORS = np.random.randint(0, 255, size=(len(LABELS), 3), dtype='uint8')
Enter fullscreen mode Exit fullscreen mode

Next, we load in the YOLO model using the configuration and weights we downloaded before. The readNetFromDarknet function helps us to do so.

print('Loading YOLO from disk...')

neural_net = cv2.dnn.readNetFromDarknet(YOLOV3_CFG_PATH, YOLOV3_WEIGHTS_PATH)
layer_names = neural_net.getLayerNames()
layer_names = [layer_names[i[0] - 1] for i in neural_net.getUnconnectedOutLayers()]
Enter fullscreen mode Exit fullscreen mode

layer_names consists of all the output layer names we need from YOLO.

Now, we use OpenCV’s VideoCapture function to read the input video stream.

vs = cv2.VideoCapture(VIDEO_PATH)
writer = None
(W, H) = (None, None)

try:
    if(imutils.is_cv2()):
        prop = cv2.cv.CV_CAP_PROP_FRAME_COUNT
    else:
        prop = cv2.CAP_PROP_FRAME_COUNT
    total = int(vs.get(prop))
    print('Total frames detected are: ', total)
except Exception as e:
    print(e)
    total = -1
Enter fullscreen mode Exit fullscreen mode

We also set the dimensions of the video frame (W, H) as (None, None) initially. After this, we use the CAP_PROP_FRAME_COUNT of OpenCV to count the number of frames in the given input video stream. We also embed this in a try/except in order to catch any exceptions.

We then read each frame of the input video stream.

while True:
    (grabbed, frame) = vs.read()

    if not grabbed:
        break

    if W is None or H is None:
        H, W = (frame.shape[0], frame.shape[1])

    blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (416, 416), swapRB=True, crop=False)
    neural_net.setInput(blob)

    start_time = time.time()
    layer_outputs = neural_net.forward(layer_names)
    end_time = time.time()
Enter fullscreen mode Exit fullscreen mode

OpenCV’s read function helps us do that easily. What is a frame you ask? It is simple! As the name suggests, a frame is basically one shot of the video. All these frames stitched together makes up a video. The frame is an array consists of 3 arrays. Each array represents a color i.e Blue, Green, Red(BGR). Each array consists of numbers between 0 to 255, which are called as pixel values. Each image is made up of pixels. So for a 4 * 4 image, there are 16 pixels.
We use a while loop to loop over all the frames of the input video. If in any case a frame is not grabbed we break the while loop as it may be the end of the video. We also update our H and W variables from (None, None) to the (height_of_frame, width_of_frame). Next, we create a blob of the image frame. As OpenCV uses ‘traditional’ representation of colors, they are in the form of BGR(Blue, Greeen, Red). So, we pass the argument swapRB = True to swap the R&B color arrays. Thus, we now get an RGB color array. We also rescale the image by dividing the array elements by 255, so that each element lies between 0 to 1 which helps the model to perform better.
A BLOB stands for Binary Large OBject and refers to a group of connected pixels in a binary image. We then give that as input to the model and then we perform a forward pass of YOLO.

The output from YOLO consists of a set of values. These values help us define which class the object is of and it also gives us the detected object’s bounding box values.

    boxes = []
    confidences = []
    classIDs = []
    lines = []
    box_centers = []

    for output in layer_outputs:
        for detection in output:

            scores = detection[5:]
            classID = np.argmax(scores)
            confidence = scores[classID]

            if confidence > 0.5 and classID == 0:
                box = detection[0:4] * np.array([W, H, W, H])
                (centerX, centerY, width, height) = box.astype('int')

                x = int(centerX - (width / 2))
                y = int(centerY - (height / 2))

                box_centers = [centerX, centerY]

                boxes.append([x, y, int(width), int(height)])
                confidences.append(float(confidence))
                classIDs.append(classID)
Enter fullscreen mode Exit fullscreen mode

We loop over every output in the layer_outputs and every detection in the output. We get the scores of each class(80 classes from the COCO names) from the detection array. Also we get the confidence of each class. We then keep a threshold confidence as 0.5 and as we are only interested in detecting people, we set the classID as 0. From each detection we get a bounding box. The first 4 elements of the detection array gives us [X_center_of_box, Y_center_of_box, Width_of_box, Height_of_box], which we then scale to our image frame dimensions.

Then we start drawing the bounding boxes

idxs = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.3)

    if len(idxs) > 0:
        unsafe = []
        count = 0

        for i in idxs.flatten():

            (x, y) = (boxes[i][0], boxes[i][1])
            (w, h) = (boxes[i][2], boxes[i][3])
            centeriX = boxes[i][0] + (boxes[i][2] // 2)
            centeriY = boxes[i][1] + (boxes[i][3] // 2)

            color = [int(c) for c in COLORS[classIDs[i]]]
            text = '{}: {:.4f}'.format(LABELS[classIDs[i]], confidences[i])

            idxs_copy = list(idxs.flatten())
            idxs_copy.remove(i)

            for j in np.array(idxs_copy):
                centerjX = boxes[j][0] + (boxes[j][2] // 2)
                centerjY = boxes[j][1] + (boxes[j][3] // 2)

                distance = math.sqrt(math.pow(centerjX - centeriX, 2) + math.pow(centerjY - centeriY, 2))

                if distance <= SAFE_DISTANCE:
                    cv2.line(frame, (boxes[i][0] + (boxes[i][2] // 2), boxes[i][1]  + (boxes[i][3] // 2)), (boxes[j][0] + (boxes[j][2] // 2), boxes[j][1] + (boxes[j][3] // 2)), (0, 0, 255), 2)
                    unsafe.append([centerjX, centerjY])
                    unsafe.append([centeriX, centeriY])

            if centeriX in chain(*unsafe) and centeriY in chain(*unsafe):
                count += 1
                cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 0, 255), 2)
            else:
                cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)

            cv2.putText(frame, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
            cv2.rectangle(frame, (50, 50), (450, 90), (0, 0, 0), -1)
            cv2.putText(frame, 'No. of people unsafe: {}'.format(count), (70, 70), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 3)
Enter fullscreen mode Exit fullscreen mode

We use Non-Max Suppression in order to avoid to avoid weak and overlapping bounding boxes. Then we calculate the distance between the centroid of current box with all the other detected bounding box centroids. We use the Euclidean distance to measure the distances between the boxes. Below is the formula for Euclidean distance.
Image for post
We compare the each distance with the SAFE_DISTANCE constant we defined earlier in the constants.py file. Next, we use the use the rectangle function of OpenCV to create a rectangle with the box dimensions we received from the model. We check if the box is safe or unsafe. If unsafe then the box color will be colored red else the box will be colored green. We also display a text showing the number of people unsafe using OpenCV’s text function.

Now we create a video by joining each frame back

    if writer is None:

        fourcc = cv2.VideoWriter_fourcc(*'MJPG')
        writer = cv2.VideoWriter(OUTPUT_PATH, fourcc, 30,(frame.shape[1], frame.shape[0]), True)

        if total > 0:
            elap = (end_time - start_time)
            print('Single frame took {:.4f} seconds'.format(elap))
            print('Estimated total time to finish: {:.4f}'.format(elap * total))

    writer.write(frame)

print('Cleaning up...')
writer.release()
vs.release()   
Enter fullscreen mode Exit fullscreen mode

The VideoWriter function of OpenCV helps us to do that. It will store the output video at the location specified by OUTPUT_PATH which we have defined in the constants.py file earlier. The release function will then release the file pointers.

Output

Phew!… Now that the coding part is over, time to see the fruits of our effort.
Go ahead and run the main.py file as follows.
python main.py

Once the program is executed completely check your output folder and open the output.avi file.
It should look something like this…

Alt Text

Impressive right!

Limitations and Future Scope

Although this project is cool, it has a few limitations,

This project does not take into account the camera perspective.
It does not leverage a proper camera calibration (Distances are not measure accurate).

I will work on these limitations in the future.

End notes

You can find the entire code for this article here.
Leave a ⭐ on the repo and a ❤️ on this article if you found it useful. Thank you:)

Discussion

pic
Editor guide
Collapse
pushpak1300 profile image
Pushpak Chhajed

Awesome! :o

Collapse
sherwyn11 profile image
Collapse
masterhareesh profile image
masterhareesh

where is the constant.py file bruh!!!

Collapse
sherwyn11 profile image
Sherwyn D'souza Author

I've attached a file directory structure image above. So create a constants.py file accordingly and then add this:

YOLOV3_LABELS_PATH = './yolov3/coco.names'
YOLOV3_CFG_PATH = './yolov3/yolov3.cfg'
YOLOV3_WEIGHTS_PATH = './yolov3/yolov3.weights'
VIDEO_PATH = './videos/video.mp4'
OUTPUT_PATH = './output/output.avi'
SAFE_DISTANCE = 60
Enter fullscreen mode Exit fullscreen mode
Collapse
kodumuri369 profile image
Kodumuri sai Krishna

can you share limitations and importance of the project

Collapse
sherwyn11 profile image
Sherwyn D'souza Author

Limitations:
This project does not take into account the camera perspective.
It does not leverage a proper camera calibration (Distances are not measure accurate).

Importance:
Social distancing is deliberately increasing the physical space between people to avoid spreading illness. Staying at least six feet away from other people lessens your chances of contracting COVID-19. We can use OpenCV and YOLO to monitor/analyze whether people are maintaining social distancing or not. It could be used for just keeping a track whether people are maintaining social distancing.