In the past few months I have been working on utilizing the Intel® Movidius™ Neural Compute Stick (NCS) to do some rapid prototyping with deep neural nets for person detection in still images and real-time video streams. To work with this device, I have an Intel® Nuc set up with Ubuntu (64 bit) 16.04 Desktop and a USB camera. With this, I wanted to share some things I have learned as I have gone through this process...
Powered by the Intel® Movidius™ Myriad™ 2 vision processing unit (VPU), the NCS includes an array of 12 VLIW vector processors called SHAVE processors that accelerate the neural networks by running parts of the network in parallel. Once connected to a host machine, the Neural Compute API (NCAPI) is used to initialize and open a NCS device, load the firmware to the device, and accept neural net graph files and instructions to execute inferences. The steps are detailed a bit more below:
(1) Connect to a host machine
(2) Initialize and open NCS device
This set of the code is where the device is enumerated and the Python code will quit if no device has been found. This is why in step one, the device must be connected. Once connected and found, the device can give a handle to the code for opening which is returned.
# Look for enumerated NCS device(s); quit program if none found. devices = mvnc.EnumerateDevices() if len( devices ) == 0: print( "No devices found" ) quit() # Get a handle to the first enumerated device and open it device = mvnc.Device( devices ) device.OpenDevice() return device
(3) Load the firmware to the device on first launch
(4) Accept neural net graph files
This step is when the graph file is loaded into the NCS device from the DNN model. The graph used in this example is '../caffe/SSD_MobileNet/graph' to detect people using class 15 for people with 75% confidence. This graph is loaded into a buffer and then into the NCS before being returned.
# Load a graph file onto the NCS device # Parameters: self and enumerated device # Return: graph file for NCS def load_graph(self, device): # Read the graph file into a buffer with open( self.ARGS.graph, mode='rb' ) as f: blob = f.read() # Load the graph buffer into the NCS graph = device.AllocateGraph( blob ) return graph
Now that the graph is with the NCS, the instructions to execute on this graph can be implemented.
(5) Pre-process images
The images are first pre-processed before inference is done. This is done in three steps to resize the image, convert the image RGB to BGR for OpenCV, and using mean subtraction and scaling to center the data. Once this has been completed, the image can be returned and used for inference.
Once all steps above have been completed, overlaying bonding boxes can be placed, and detection classes and scores can be printed if images belong to a specific class (15: person).
After completing this overall process first with real-time image processing, I re-evaluated the code to determine what changed needed to be made to evaluate still images for a project I was working on. The first thing I noticed was to change one of the original arguments from video to image:
# Video parser.add_argument( '-v', '--video', type=int, default=0, help="Index of your computer's V4L2 video device. \ ex. 0 for /dev/video0" ) # Image parser.add_argument( '-i', '--image', type=str, default='../images/*.jpg', help="Image path" )
This change allowed images to be read in from the images folder. After doing so, some changes were required in code to adapt from camera stream to still images. One major section was looking over how the NCS device received images. I utilized glob to read in images from the images path and iterate over all images in that folder. This iteration allowed for each image to be processed and inferred upon with the code above.
images = glob.glob('../images/*.jpg') for image in images: frame = cv2.imread(image) img = cam.pre_process_image(frame) cam.infer_image(graph, img, frame)
Learning how to change between real-time streaming and still images was a useful learning experience that has aided in a project I am working on in which I need to clean a large image dataset based on if the images do or do not contain people.