adriens

Posted on Jun 28, 2022

📊 "GitHub InFocus" speech data analysis w. videogrep 🎞️

#datascience #fun #devops #github

🙋 About

GitHub recently published "Propelling your DevOps to new heights | GitHub InFocus", a exciting DevOPS related content :

Also, within the same period of time I watched an episode of "The Download" series (animated by @film_girl ):

The Download: Maintainer Month, .NET MAUI Goes GA, Flight Simulator: Top Gun, and more - YouTube

On this episode of The Download, Christina is on location at RenderATL, but is still here to offer the latest developer news, including:0:00 Intro0:59 Mainta...

youtube.com

This episode did introduce videogrep :

antiboredom / videogrep

automatic video supercuts with python

Videogrep

Videogrep is a command line tool that searches through dialog in video files and makes supercuts based on what it finds. It will recognize .srt or .vtt subtitle tracks, or transcriptions that can be generated with vosk, pocketsphinx, and other tools.

Examples

Tutorial

See my blog for a short tutorial on videogrep and yt-dlp, and part 2, on videogrep and natural language processing.

Installation

Videogrep is compatible with Python versions 3.6 to 3.10.

To install:

pip install videogrep

If you want to transcribe videos, you also need to install vosk:

pip install vosk

Note: the previous version of videogrep supported pocketsphinx for speech-to-text. Vosk seems much better so I've added…

View on GitHub

Then came the idea :

What if I was analyzing "GitHub Infocus" with videogrep ?

This short post will guide through this first trial on videogrep and what I have been able to produce, discover... and the fun I also had.

☝️ Notice that I used the following excellent tutorial to perform this experience 👇

📥 Get the video with `yt-dlp`

First I want to get the YT video https://youtu.be/awQ7LFxfXWE locally, therefore you can choose many encoding options and choose the one that best fits your needs (-F option) but in our case, we'll get the default one :

yt-dlp https://youtu.be/awQ7LFxfXWE -o propelling_your_devops.mp4 --write-auto-sub

Then you are ready for the next step : use videogrep.

📊 Text analysis with `ngrams`

videogrep makes it possible (and super easy) to analyze text within the (downloaded vtt files) subtitles.

So, what are the trendiest group of word ( called ngrams) in the video ? Let's find out !

While the single word analysis is not really interesting :

❯ videogrep --input propelling_your_devops.mp4.webm --ngrams 1 | head -10
to 449
and 352
that 347
you 323
the 322
we 306
a 255
of 251
so 167
is 157

2-ngrams are much more interesting about the underlying intents of the video :

❯ videogrep --input propelling_your_devops.mp4.webm --ngrams 2 | head -7
want to 97
that we 61
you can 55
you know 54
going to 51
we have 45
we can 45

... soon confirmed with the 3-grams :

❯ videogrep --input propelling_your_devops.mp4.webm --ngrams 3 | head -9
we want to 30
you want to 20
a lot of 19
want to make 19
make sure that 18
i'm going to 17
to make sure 17
we have a 16
i want to 13

🔬 Short analysis

With the help of ngrams, within less than a second we discover, by grepping the text of the video that

"GitHub focuses it attention on what they want... and also on what you want to achieve... and make"

👉 That first fact already tells us a lot.

☝️ It also puts in evidence

"the inclusive approach while using a lot of "I" and "We"

... which is also pretty exciting to onboard us on the product they are showcasing ❣️

✂️🎞️ Cut & get shorts

Now, the fun part.

You have made a text analysis but... wouldn't it be fun to see the movie of these grepped terms ?...

⚠️ Spoiler alert : Yes it is ❕ (and it's easy) 🤣

These are called fragments. Let's get some of them.

🎯 The "Want" movie

Let's get all the sentences containing "want"

videogrep --input propelling_your_devops.mp4.webm --search 'want' --resyncsubs 0.1 --output want_sentence.mp4

🤪 Also "we want" to get the "want" movie 🤣 :

videogrep --input propelling_your_devops.mp4.webm --search 'we want to' --search-type fragment --resyncsubs 0.1 --output want.mp4

🤓 GH talking about code

What we think the more when we think about Github services is : the "code".

Let's make them talk about "code"

videogrep --input propelling_your_devops.mp4.webm --search 'code' --search-type fragment --resyncsubs 0.1 --output code.mp4

➰ Github about GitHub 😹

Last but not least, I'd love to

see how GitHub talks about GitHub

videogrep --input propelling_your_devops.mp4.webm --search 'github' --search-type fragment --resyncsubs 0.1 --output github.mp4

🧑‍🎨 Conclusion

These tools open a very wide area for speech and video analysis... making it possible to put in evidence patterns, intentions or simply have fun.

Also, being aware that yt-dlp makes it possbible to download complete channel, playlists or search queries...

possibilities are endless.

🔖 Resources

Vosk : speech recognition toolkit
yt-dlp : A youtube-dl fork with additional features and fixes
@sam_lavigne
"GitHub Infocus 2022 analysis" playlist on YT

🗞️ News

In its 2.1.1 , videogrep adds some really cool features like (but not only) :

Finding "non-english vtt subtitle files"
"Examples that integrate with spaCy"

Top comments (3)

adriens • Jul 1 '22

Adrien SALES

@rastadidi

🤗 I had a lot of #fun this week... and first time I saw the "videogrepped" expression appear 🤓
#github #GitHubInFocus #dowhatyoulove #videogrep #opensource #LearnByDoing @film_girl @peckjon @sam_lavigne

21:37 PM - 01 Jul 2022

adriens • Jun 28 '22

Jon Peck being "videogrepped" 🤣

adriens • Jun 28 '22

Adrien SALES

@rastadidi

💡A few weeks ago I saw a nice episode from @film_girl (The Download on @github ) about a very exciting tool called #videogrep and maintained by @sam_lavigne .
👉 As a #github & #devops 🤓 I decided to make some experiments 🧑‍🎨
dev.to/adriens/github…
#DataScience #GitHubInFocus

07:01 AM - 28 Jun 2022

DEV Community

📊 "GitHub InFocus" speech data analysis w. videogrep 🎞️

🙋 About

The Download: Maintainer Month, .NET MAUI Goes GA, Flight Simulator: Top Gun, and more - YouTube

antiboredom / videogrep

automatic video supercuts with python

Videogrep

Examples

Tutorial

Installation

📥 Get the video with `yt-dlp`

📊 Text analysis with `ngrams`

🔬 Short analysis

✂️🎞️ Cut & get shorts

🎯 The "Want" movie

🤓 GH talking about code

➰ Github about GitHub 😹

🧑‍🎨 Conclusion

🔖 Resources

🗞️ News

Top comments (3)

Read next

LINUX CHALLENGE DAY 5

Agentic Mesh: Pioneering the Future of Autonomous Agent Ecosystems

Horizontal Pod Scaling vs Vertical Pod Scaling in Kubernetes: A Comprehensive Guide

Real-Life Kubernetes Interview Questions and Answers with Explanations

🙋 About

The Download: Maintainer Month, .NET MAUI Goes GA, Flight Simulator: Top Gun, and more - YouTube

antiboredom / videogrep

automatic video supercuts with python

Videogrep

Examples

Tutorial

Installation

📥 Get the video with yt-dlp

📊 Text analysis with ngrams

🔬 Short analysis

✂️🎞️ Cut & get shorts

🎯 The "Want" movie

🤓 GH talking about code

➰ Github about GitHub 😹

🧑‍🎨 Conclusion

🔖 Resources

🗞️ News

Read next

LINUX CHALLENGE DAY 5

Agentic Mesh: Pioneering the Future of Autonomous Agent Ecosystems

Horizontal Pod Scaling vs Vertical Pod Scaling in Kubernetes: A Comprehensive Guide

Real-Life Kubernetes Interview Questions and Answers with Explanations

📥 Get the video with `yt-dlp`

📊 Text analysis with `ngrams`