DEV Community

Cover image for Real-Time and Still Image Classification
Rose Day
Rose Day

Posted on

Real-Time and Still Image Classification

In the past few months I have been working on utilizing the Intel® Movidius™ Neural Compute Stick (NCS) to do some rapid prototyping with deep neural nets for person detection in still images and real-time video streams. To work with this device, I have an Intel® Nuc set up with Ubuntu (64 bit) 16.04 Desktop and a USB camera. With this, I wanted to share some things I have learned as I have gone through this process...

Powered by the Intel® Movidius™ Myriad™ 2 vision processing unit (VPU), the NCS includes an array of 12 VLIW vector processors called SHAVE processors that accelerate the neural networks by running parts of the network in parallel. Once connected to a host machine, the Neural Compute API (NCAPI) is used to initialize and open a NCS device, load the firmware to the device, and accept neural net graph files and instructions to execute inferences. The steps are detailed a bit more below:

NCS Device Setup

(1) Connect to a host machine

(2) Initialize and open NCS device
This set of the code is where the device is enumerated and the Python code will quit if no device has been found. This is why in step one, the device must be connected. Once connected and found, the device can give a handle to the code for opening which is returned.

# Look for enumerated NCS device(s); quit program if none found.
devices = mvnc.EnumerateDevices()
if len( devices ) == 0:
    print( "No devices found" )
    quit()

# Get a handle to the first enumerated device and open it
    device = mvnc.Device( devices[0] )
    device.OpenDevice()

return device
Enter fullscreen mode Exit fullscreen mode

(3) Load the firmware to the device on first launch

(4) Accept neural net graph files
This step is when the graph file is loaded into the NCS device from the DNN model. The graph used in this example is '../caffe/SSD_MobileNet/graph' to detect people using class 15 for people with 75% confidence. This graph is loaded into a buffer and then into the NCS before being returned.

# Load a graph file onto the NCS device
# Parameters: self and enumerated device 
# Return: graph file for NCS   
def load_graph(self, device):

    # Read the graph file into a buffer
    with open( self.ARGS.graph, mode='rb' ) as f:
        blob = f.read()

        # Load the graph buffer into the NCS
        graph = device.AllocateGraph( blob )

return graph
Enter fullscreen mode Exit fullscreen mode

Now that the graph is with the NCS, the instructions to execute on this graph can be implemented.

Instructions to Execute Inferences

(5) Pre-process images
The images are first pre-processed before inference is done. This is done in three steps to resize the image, convert the image RGB to BGR for OpenCV, and using mean subtraction and scaling to center the data. Once this has been completed, the image can be returned and used for inference.

(6) Inference
Once all steps above have been completed, overlaying bonding boxes can be placed, and detection classes and scores can be printed if images belong to a specific class (15: person).

Adapting Real-Time Image Processing to Still Images

After completing this overall process first with real-time image processing, I re-evaluated the code to determine what changed needed to be made to evaluate still images for a project I was working on. The first thing I noticed was to change one of the original arguments from video to image:

# Video 
parser.add_argument( '-v', '--video', type=int,
    default=0,
    help="Index of your computer's V4L2 video device. \
    ex. 0 for /dev/video0" )

# Image 
parser.add_argument( '-i', '--image', type=str,
    default='../images/*.jpg',
    help="Image path" )
Enter fullscreen mode Exit fullscreen mode

This change allowed images to be read in from the images folder. After doing so, some changes were required in code to adapt from camera stream to still images. One major section was looking over how the NCS device received images. I utilized glob to read in images from the images path and iterate over all images in that folder. This iteration allowed for each image to be processed and inferred upon with the code above.

images = glob.glob('../images/*.jpg')
for image in images:
    frame = cv2.imread(image) 
    img = cam.pre_process_image(frame)
    cam.infer_image(graph, img, frame)
Enter fullscreen mode Exit fullscreen mode

Learning how to change between real-time streaming and still images was a useful learning experience that has aided in a project I am working on in which I need to clean a large image dataset based on if the images do or do not contain people.

References

Intel Optimized Packages for the Intel Distribution for Python
OpenVino
Intel Movidius NCS
Real-Time Person Detection Repo
Caffe MobileNet-SSD
Cover image sourced from Wallpaper Cave

Top comments (6)

Collapse
 
ben profile image
Ben Halpern

It’s amazing how much more accessible this stuff has become.

Collapse
 
rosejcday profile image
Rose Day

Agreed! It has been very interesting to learn about and get into. Amazing what can be done with it.

Collapse
 
ben profile image
Ben Halpern

A lot of this post still goes over my head though 😄

Thread Thread
 
rosejcday profile image
Rose Day

I feel the same still at times. It has been a lot to absorb. Very interesting but deep topic to get into. I have been learning much as I go along but there is still so much more to learn!

Collapse
 
harishkgarg profile image
Harish Garg

How is it performance wise? What kind of specs are for your setup?

Collapse
 
rosejcday profile image
Rose Day

Hello! I have an Intel Nuc that contains an Intel Core i5 running Ubuntu (64 bit) 16.04 with a 1 TB drive. With this, I have been using the Intel® Movidius™ Neural Compute Stick and a GeChic touchscreen. Performance wise, I am not sure what I can really say for this. For still images, it takes a few seconds to run when I have only a few images. I have not run it on a massive dataset yet. With a live camera feed it is relatively quick, second or so, to detect a person and save a screen shot. The only issue I had was with too much light pollution, which made it hard for the camera to realize a person was present.