Video understanding or video insights are crucial across various industries and applications due to their multifaceted benefits. They enhance content analysis and management by automatically generating metadata, categorizing content, and making videos more searchable. Moreover, video insights provide critical data that drive decision-making, enhance user experiences, and improve operational efficiencies across diverse sectors.
Google’s Gemini 1.5 model brings significant advancements to this field. Beyond its impressive improvements in language processing, this model can handle an enormous input context of up to 1 million tokens. To further its capabilities, Gemini 1.5 is trained as a multimodal model, natively processing text, images, audio, and video. This powerful combination of varied input types and extensive context size opens up new possibilities for processing long videos effectively.
In this article, we will dive into how Gemini 1.5 can be leveraged for generating valuable video insights, transforming the way we understand and utilize video content across different domains.
Getting Started
Table of contents
- What is Gemini 1.5
- Prerequisites
- Installing dependencies
- Setting up the Gemini API key
- Setting up the environment variables
- Importing the libraries
- Initializing the project
- Saving uploaded files
- Generating insights from videos
- Upload a video to the Files API
- Get File
- Response Generation
- Delete File
- Combining the stages
- Creating the interface
- Creating the streamlit app
What is Gemini 1.5
Google’s Gemini 1.5 represents a significant leap forward in AI performance and efficiency. Building upon extensive research and engineering innovations, this model features a new Mixture-of-Experts (MoE) architecture, enhancing both training and serving efficiency. Available in public preview, Gemini 1.5 Pro and 1.5 Flash offer an impressive 1 million token context window through Google AI Studio and Vertex AI.
Google Gemini updates: Flash 1.5, Gemma 2 and Project Astra (blog.google)
The 1.5 Flash model, the newest addition to the Gemini family, is the fastest and most optimized for high-volume, high-frequency tasks. It is designed for cost-efficiency and excels in applications such as summarization, chat, image and video captioning, and extracting data from extensive documents and tables. With these advancements, Gemini 1.5 sets a new standard for performance and versatility in AI models.
Prerequisites
Installing dependencies
- Create and activate a virtual environment by executing the following command.
python -m venv venv
source venv/bin/activate #for ubuntu
venv/Scripts/activate #for windows
- Install google-generativeai, streamlit, python-dotenv library using pip. Note that generativeai requires python 3.9+ version to work.
pip install google-generativeai streamlit python-dotenv
Setting up the Gemini API key
To access the Gemini API and begin working with its functionalities, you can acquire a free Google API Key by registering with Google AI Studio. Google AI Studio, offered by Google, provides a user-friendly, visual-based interface for interacting with the Gemini API. Within Google AI Studio, you can seamlessly engage with Generative Models through its intuitive UI, and if desired, generate an API Token for enhanced control and customization.
Follow the steps to generate a Gemini API key:
- To initiate the process, you can either click the link (https://aistudio.google.com/app) to be redirected to Google AI Studio or perform a quick search on Google to locate it.
- Accept the terms of service and click on continue.
- Click on Get API key link from the sidebar and Create API key in new project button to generate the key.
- Copy the generated API key.
Setting up the environment variables
Begin by creating a new folder for your project. Choose a name that reflects the purpose of your project.
Inside your new project folder, create a file named .env. This file will store your environment variables, including your Gemini API key.
Open the .env file and add the following code to specify your Gemini API key:
GOOGLE_API_KEY=AIzaSy......
Importing the libraries
To get started with your project and ensure you have all the necessary tools, you need to import several key libraries as follows.
import os
import time
import google.generativeai as genai
import streamlit as st
from dotenv import load_dotenv
-
google.generativeai as genai
: Imports the Google Generative AI library for interacting with the Gemini API. -
streamlit as st
: Imports Streamlit for creating web apps. -
from dotenv import load_dotenv
: Loads environment variables from a .env file.
Initializing the project
To set up your project, you need to configure the API key and create a directory for temporary file storage for uploaded files.
Define the media folder and configure the Gemini API key by initializing the necessary settings. Add the following code to your script:
MEDIA_FOLDER = 'medias'
def __init__():
# Create the media directory if it doesn't exist
if not os.path.exists(MEDIA_FOLDER):
os.makedirs(MEDIA_FOLDER)
# Load environment variables from the .env file
load_dotenv()
# Retrieve the API key from the environment variables
api_key = os.getenv("GEMINI_API_KEY")
# Configure the Gemini API with your API key
genai.configure(api_key=api_key)
Saving uploaded files
To store uploaded files in the media folder and return their paths, define a method called save_uploaded_file
and add the following code to it.
def save_uploaded_file(uploaded_file):
"""Save the uploaded file to the media folder and return the file path."""
file_path = os.path.join(MEDIA_FOLDER, uploaded_file.name)
with open(file_path, 'wb') as f:
f.write(uploaded_file.read())
return file_path
Generating insights from videos
Generating insights from videos involves several crucial stages, including uploading, processing, and response generation.
1. Upload a video to the Files API
The Gemini API directly accepts video file formats. The File API supports files up to 2GB in size and allows storage of up to 20GB per project. Uploaded files remain available for 2 days and cannot be downloaded from the API.
video_file = genai.upload_file(path=video_path)
2. Get File
After uploading a file, you can verify that the API has successfully received it by using the files.get method. This method allows you to view the files uploaded to the File API that are associated with the Cloud project linked to your API key. Only the file name and the URI are unique identifiers.
import time
while video_file.state.name == "PROCESSING":
print('Waiting for video to be processed.')
time.sleep(10)
video_file = genai.get_file(video_file.name)
if video_file.state.name == "FAILED":
raise ValueError(video_file.state.name)
3. Response Generation
After the video has been uploaded, you can make GenerateContent
requests that reference the File API URI.
# Create the prompt.
prompt = "Describe the video. Provides the insights from the video."
# Set the model to Gemini 1.5 Flash.
model = genai.GenerativeModel(model_name="models/gemini-1.5-flash")
# Make the LLM request.
print("Making LLM inference request...")
response = model.generate_content([prompt, video_file],
request_options={"timeout": 600})
print(response.text)
4. Delete File
Files are automatically deleted after 2 days or you can manually delete them using files.delete()
.
genai.delete_file(video_file.name)
5. Combining the stages
Create a method called get_insights
and add the following code to it. Instead print()
, use streamlit write()
method to see the messages on the website.
def get_insights(video_path):
"""Extract insights from the video using Gemini Flash."""
st.write(f"Processing video: {video_path}")
st.write(f"Uploading file...")
video_file = genai.upload_file(path=video_path)
st.write(f"Completed upload: {video_file.uri}")
while video_file.state.name == "PROCESSING":
st.write('Waiting for video to be processed.')
time.sleep(10)
video_file = genai.get_file(video_file.name)
if video_file.state.name == "FAILED":
raise ValueError(video_file.state.name)
prompt = "Describe the video. Provides the insights from the video."
model = genai.GenerativeModel(model_name="models/gemini-1.5-flash")
st.write("Making LLM inference request...")
response = model.generate_content([prompt, video_file],
request_options={"timeout": 600})
st.write(f'Video processing complete')
st.subheader("Insights")
st.write(response.text)
genai.delete_file(video_file.name)
Creating the interface
To streamline the process of uploading videos and generating insights within a Streamlit app, you can create a method named app. This method will provide an upload button, display the uploaded video, and generate insights from it.
def app():
st.title("Video Insights Generator")
uploaded_file = st.file_uploader("Upload a video file", type=["mp4", "avi", "mov", "mkv"])
if uploaded_file is not None:
file_path = save_uploaded_file(uploaded_file)
st.video(file_path)
get_insights(file_path)
if os.path.exists(file_path): ## Optional: Removing uploaded files from the temporary location
os.remove(file_path)
Creating the streamlit app
To create a complete and functional Streamlit application that allows users to upload videos and generate insights using the Gemini 1.5 Flash model, combine all the components into a single file named app.py.
Here is the final code:
import os
import time
import google.generativeai as genai
import streamlit as st
from dotenv import load_dotenv
MEDIA_FOLDER = 'medias'
def __init__():
if not os.path.exists(MEDIA_FOLDER):
os.makedirs(MEDIA_FOLDER)
load_dotenv() ## load all the environment variables
api_key = os.getenv("GEMINI_API_KEY")
genai.configure(api_key=api_key)
def save_uploaded_file(uploaded_file):
"""Save the uploaded file to the media folder and return the file path."""
file_path = os.path.join(MEDIA_FOLDER, uploaded_file.name)
with open(file_path, 'wb') as f:
f.write(uploaded_file.read())
return file_path
def get_insights(video_path):
"""Extract insights from the video using Gemini Flash."""
st.write(f"Processing video: {video_path}")
st.write(f"Uploading file...")
video_file = genai.upload_file(path=video_path)
st.write(f"Completed upload: {video_file.uri}")
while video_file.state.name == "PROCESSING":
st.write('Waiting for video to be processed.')
time.sleep(10)
video_file = genai.get_file(video_file.name)
if video_file.state.name == "FAILED":
raise ValueError(video_file.state.name)
prompt = "Describe the video. Provides the insights from the video."
model = genai.GenerativeModel(model_name="models/gemini-1.5-flash")
st.write("Making LLM inference request...")
response = model.generate_content([prompt, video_file],
request_options={"timeout": 600})
st.write(f'Video processing complete')
st.subheader("Insights")
st.write(response.text)
genai.delete_file(video_file.name)
def app():
st.title("Video Insights Generator")
uploaded_file = st.file_uploader("Upload a video file", type=["mp4", "avi", "mov", "mkv"])
if uploaded_file is not None:
file_path = save_uploaded_file(uploaded_file)
st.video(file_path)
get_insights(file_path)
if os.path.exists(file_path): ## Optional: Removing uploaded files from the temporary location
os.remove(file_path)
__init__()
app()
Running the application
Execute the following code to run the application.
streamlit run app.py
You can open the link provided in the console to see the output.
Thanks for reading this article !!
If you enjoyed this article, please click on the heart button ♥ and share to help others find it!
The full source code for this tutorial can be found here,
Top comments (0)