TLDR; This post will show how to use the Azure Video Indexer, Computer Vision API and Custom Vision Services to extract key frames and detect custom image tags in indexed videos.
All code for the tutorial can be found in the notebook below. This code can be extended to support almost any image classification or object detection task.
aribornstein/AzureVideoIndexerVisualBrandDetection
The tutorial requires an Azure subscription, however everything can be achieved using the free tier. If you are new to Azure you can get a free subscription here.
Create your Azure free account today | Microsoft Azure
What is Azure Video Indexer?
Azure Video Indexer automatically extracts metadata — such as spoken words, written text, faces, speakers, celebrities, emotions, topics, brands, and scenes from video and audio files. Developers can then access the data within their application or infrastructure, make it more discover-able, and use it to create new over-the-top (OTT) experiences and monetization opportunities
Use the Video Indexer API - Azure Media Services
Often, we wish to extract useful tags from videos content.These tags are often the differentiating factor for having successful engagement on social media services such as Instagram, Facebook, and YouTube
This tutorial will show how to use Azure Video Indexer, Computer Vision API, and Custom Vision service to extract key frames and custom tags. We will use these Azure services to detect custom brand logos in indexed videos.
This code can be extended to support almost any image classification or object detection task.
Step #1 Download A Sample Video with the pyTube API
The first step is to download a sample video to be indexed. We will be downloading an episode of Azure Mythbusters on Azure Machine Learning by my incredible Co-Worker Amy Boyd using the Open Source pyTube API!
Installation:
pyTube can be installed with pip
!pip install pytube3 --upgrade
Code:
from pytube import YouTube
from pathlib import Path
video2Index = YouTube('https://www.youtube.com/watch?v=ijtKxXiS4hE').streams[0].download()
video\_name = Path(video2Index).stem
Step #2 Create An Azure Video Indexer Instance
Navigate to https://www.videoindexer.ai/ and follow the instructions to create an Account
For the next steps, you will need your Video Indexer
- Subscription Key
- Location
- Account Id
These can be found in the account settings page in the Video Indexer Website pictured above. For more information see the documentation below. Feel free to comment below if you get stuck.
Use the Video Indexer API - Azure Media Services
Step #3 Use the Unofficial Video Indexer Python Client to Process our Video and Extract Key Frames
To interact with the Video Indexer API, we will use the unofficial Python client.
Installation:
pip install video-indexer
Code:
- Initialize Client:
vi = VideoIndexer(vi\_subscription\_key='SUBSCRIPTION\_KEY',
vi\_location='LOCATION',
vi\_account\_id='ACCOUNT\_ID')
- Upload Video:
video\_id = vi.upload\_to\_video\_indexer(
input\_filename = video2Index,
video\_name=video\_name, #must be unique
video\_language='English')
- Get Video Info
info = vi.get\_video\_info(video\_id, video\_language='English')
- Extract Key Frame Ids
keyframes = []
for shot in info["videos"][0]["insights"]["shots"]:
for keyframe in shot["keyFrames"]:
keyframes.append(keyframe["instances"][0]['thumbnailId'])
- Get Keyframe Thumbnails
for keyframe in keyframes:
img\_str = vi.get\_thumbnail\_from\_video\_indexer(video\_id,
keyframe)
Step #3 Use the Azure Computer Vision API to Extract Popular Brands from Key Frames
Out of the box, Azure Video Indexer uses optical character recognition and audio transcript generated from speech-to-text transcription to detect references to popular brands.
Now, that we have extracted the key frames we are going to leverage the Computer Vision API to extend this functionality to see if there are any known brands in the key frames.
Brand detection - Computer Vision - Azure Cognitive Services
- First we will have to create a Computer Vision API key. There is a free tier that can be used for the demo that can be generated with the instructions in the documentation link below. Once done you should get a Computer Vision subscription key and endpoint
Create a Cognitive Services resource in the Azure portal - Azure Cognitive Services
After we have our Azure Computer Vision subscription key and endpoint , we can then use the Client SDK to evaluate our video’s keyframes:
Installation:
pip install --upgrade azure-cognitiveservices-vision-computervision
Code:
- Initialize Computer Vision Client
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from msrest.authentication import CognitiveServicesCredentials
computervision\_client = ComputerVisionClient(endpoint, CognitiveServicesCredentials(subscription\_key))
- Send Keyframe To Azure Computer Vision Service to Detect Brands
import time
timeout\_interval, timeout\_time = 5, 10.0
image\_features = ["brands"]
for index, keyframe in enumerate(keyframes):
if index % timeout\_interval == 0:
print("Trying to prevent exceeding request limit waiting {} seconds".format(timeout\_time))
time.sleep(timeout\_time)
# Get KeyFrame Image Byte String From Video Indexer
img\_str = vi.get\_thumbnail\_from\_video\_indexer(video\_id, keyframe)
# Convert Byte Stream to Image Stream
img\_stream = io.BytesIO(img\_str)
# Analyze with Azure Computer Vision
cv\_results = computervision\_client.analyze\_image\_in\_stream(img\_stream, image\_features)
print("Detecting brands in keyframe {}: ".format(keyframe))
if len(cv\_results.brands) == 0:
print("No brands detected.")
else:
for brand in cv\_results.brands:
print("'{}' brand detected with confidence {:.1f}% at location {}, {}, {}, {}".format( brand.name, brand.confidence \* 100, brand.rectangle.x, brand.rectangle.x + brand.rectangle.w, brand.rectangle.y, brand.rectangle.y + brand.rectangle.h))
Azure Computer Vision API — General Brand Detection
Quickstart: Computer Vision client library - Azure Cognitive Services
Step #4 Use the Azure Custom Vision Service to Extract Custom Logos from Keyframes
The Azure Computer Vision API, provides the ability to capture many of the worlds most popular brands, but sometimes a brand may be more obscure. In the last section, we will use the Custom Vision Service, to train a custom logo detector to detect the Azure Developer Relation Mascot Bit in in the keyframes extracted by Video Indexer.
This tutorial assumes you know how to train a Custom Vision Service object detection model for brand detection. If not check out the If not, check out the documentation below for a tutorial.
Instead of deploying to mobile, however we will use the python client API for the Azure Custom Vision Service. All the information you’ll need can be found in the settings menu of your Custom Vision project.
Installation:
pip install azure-cognitiveservices-vision-customvision
Code:
- Initialize Custom Vision Service Client
from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient
prediction\_threshold = .8
prediction\_key = "Custom Vision Service Key"
custom\_endpoint = "Custom Vision Service Endpoint"
project\_id = "Custom Vision Service Model ProjectId"
published\_name = "Custom Vision Service Model Iteration Name"
predictor = CustomVisionPredictionClient(prediction\_key, endpoint=published\_name)
- Use Custom Vision Service Model to Predict Key Frames
import time
timeout\_interval, timeout\_time = 5, 10.0
for index, keyframe in enumerate(keyframes):
if index % timeout\_interval == 0:
print("Trying to prevent exceeding request limit waiting {} seconds".format(timeout\_time))
time.sleep(timeout\_time)
# Get KeyFrame Image Byte String From Video Indexer
img\_str = vi.get\_thumbnail\_from\_video\_indexer(video\_id, keyframe)
# Convert Byte Stream to Image Stream
img\_stream = io.BytesIO(img\_str)
# Analyze with Azure Computer Vision
cv\_results = predictor.detect\_image(project\_id, published\_name, img\_stream)
predictions = [pred for pred in cv\_results.predictions if pred.probability > prediction\_threshold]
print("Detecting brands in keyframe {}: ".format(keyframe))
if len(predictions) == 0:
print("No custom brands detected.")
else:
for brand in predictions:
print("'{}' brand detected with confidence {:.1f}% at location {}, {}, {}, {}".format( brand.tag\_name, brand.probability \* 100, brand.bounding\_box.left, brand.bounding\_box.top, brand.bounding\_box.width, brand.bounding\_box.height))
Conclusion
And there we have it! I am able to find all the frames that have either Microsoft for or the Cloud Advocacy Bit Logo in my video.
Next Steps
You now have all you need to extend the Azure Video Indexer Service with your own custom computer vision models. Below is a list of additional resources to take that will help you take your integration with Video Indexer to the next level.
Offline Computer Vision
In a production system, you might see request throttling from a huge number of requests. In this case, the Azure Computer Vision service can be run in an offline container
How to install and run containers - Computer Vision - Azure Cognitive Services
Additionally, the Custom Vision model can be run locally as well.
Tutorial - Deploy Custom Vision classifier to a device using Azure IoT Edge
Video Indexer + Zoom Media
Azure-Samples/media-services-video-indexer
Creating an Automated Video Processing Flow in Azure
Creating an automated video processing flow in Azure
About the Author
Aaron (Ari) Bornstein is an AI researcher with a passion for history, engaging with new technologies and computational medicine. As an Open Source Engineer at Microsoft’s Cloud Developer Advocacy team, he collaborates with Israeli Hi-Tech Community, to solve real world problems with game changing technologies that are then documented, open sourced, and shared with the rest of the world.
Top comments (1)
Hi Can you answer my question. I am planning to participate in a Azure Hackathon. So I have idea to detect social distancing using phone camera in real time. Is there a way to do this using Azure Custom Vision AI OR is there some other Azure services for this purpose ?