GitHub recently published "Propelling your DevOps to new heights | GitHub InFocus", a exciting DevOPS related content :
This episode did introduce
Videogrep is a command line tool that searches through dialog in video files and makes supercuts based on what it finds. It will recognize
.vtt subtitle tracks, or transcriptions that can be generated with vosk, pocketsphinx, and other tools.
- The Meta Experience
- All the instances of the phrase "time" in the movie "In Time"
- All the one to two second silences in "Total Recall"
- A former press secretary telling us what he can tell us
Videogrep is compatible with Python versions 3.6 to 3.10.
pip install videogrep
If you want to transcribe videos, you also need to install vosk:
pip install vosk
Note: the previous version of videogrep supported pocketsphinx for speech-to-text. Vosk seems much better so I've added…
Then came the idea :
What if I was analyzing "GitHub Infocus" with
This short post will guide through this first trial on videogrep and what I have been able to produce, discover... and the fun I also had.
☝️ Notice that I used the following excellent tutorial to perform this experience 👇
First I want to get the YT video
https://youtu.be/awQ7LFxfXWE locally, therefore you can choose many encoding options and choose the one that best fits your needs (
-F option) but in our case, we'll get the default one :
yt-dlp https://youtu.be/awQ7LFxfXWE -o propelling_your_devops.mp4 --write-auto-sub
Then you are ready for the next step : use
videogrep makes it possible (and super easy) to analyze text within the (downloaded
vtt files) subtitles.
So, what are the trendiest group of word ( called
ngrams) in the video ? Let's find out !
While the single word analysis is not really interesting :
❯ videogrep --input propelling_your_devops.mp4.webm --ngrams 1 | head -10 to 449 and 352 that 347 you 323 the 322 we 306 a 255 of 251 so 167 is 157
ngrams are much more interesting about the underlying intents of the video :
❯ videogrep --input propelling_your_devops.mp4.webm --ngrams 2 | head -7 want to 97 that we 61 you can 55 you know 54 going to 51 we have 45 we can 45
... soon confirmed with the 3-
❯ videogrep --input propelling_your_devops.mp4.webm --ngrams 3 | head -9 we want to 30 you want to 20 a lot of 19 want to make 19 make sure that 18 i'm going to 17 to make sure 17 we have a 16 i want to 13
With the help of
ngrams, within less than a second we discover, by grepping the text of the video that
"GitHub focuses it attention on what they want... and also on what you want to achieve... and make"
👉 That first fact already tells us a lot.
☝️ It also puts in evidence
"the inclusive approach while using a lot of "I" and "We"
... which is also pretty exciting to onboard us on the product they are showcasing ❣️
Now, the fun part.
You have made a text analysis but... wouldn't it be fun to see the movie of these grepped terms ?...
⚠️ Spoiler alert : Yes it is ❕ (and it's easy) 🤣
These are called
fragments. Let's get some of them.
Let's get all the sentences containing "want"
videogrep --input propelling_your_devops.mp4.webm --search 'want' --resyncsubs 0.1 --output want_sentence.mp4
🤪 Also "we want" to get the "want" movie 🤣 :
videogrep --input propelling_your_devops.mp4.webm --search 'we want to' --search-type fragment --resyncsubs 0.1 --output want.mp4
What we think the more when we think about Github services is : the "code".
Let's make them talk about "code"
videogrep --input propelling_your_devops.mp4.webm --search 'code' --search-type fragment --resyncsubs 0.1 --output code.mp4
Last but not least, I'd love to
see how GitHub talks about GitHub
videogrep --input propelling_your_devops.mp4.webm --search 'github' --search-type fragment --resyncsubs 0.1 --output github.mp4
These tools open a very wide area for speech and video analysis... making it possible to put in evidence patterns, intentions or simply have fun.
Also, being aware that
yt-dlp makes it possbible to download complete channel, playlists or search queries...
possibilities are endless.
Vosk: speech recognition toolkit
youtube-dlfork with additional features and fixes
- "GitHub Infocus 2022 analysis" playlist on YT
videogrep adds some really cool features like (but not only) :
- Finding "non-english vtt subtitle files"
- "Examples that integrate with spaCy"