DEV Community

Cover image for Day 3: How to add subtitles to YouTube videos with Python
Dilek Karasoy for Picovoice

Posted on

Day 3: How to add subtitles to YouTube videos with Python

You can use automatic captioning and YouTube Studio to add subtitles to your videos. However, it might not be accurate enough or you may just want to build your own tool.

Here's how:
1.Extract Audio
First, extract the audio from your video content. You can accomplish this using a tool such as FFmpeg.

2. Install Speech Recognition SDK
Install Leopard STT Python SDK:

pip install "pvleopard>=1.1"
Enter fullscreen mode Exit fullscreen mode

Log in to (Sign up for) Picovoice Console. It is free. Grab your AccessKey and initialize Leopard:

import pvleopard
leopard = pvleopard.create(access_key=access_key)
Enter fullscreen mode Exit fullscreen mode

3.Transcribe Audio to Text

transcript, words = leopard.process_file(audio_path)
Enter fullscreen mode Exit fullscreen mode

Leopard returns the transcription as an str with word-level metadata including timestamps and confidence.

[
    {
        "word": "it's",
        "start_sec": 8.58,
        "end_sec": 8.70,
        "confidence": 0.78
    },
    {
        "word": "important",
        "start_sec": 8.77,
        "end_sec": 9.12,
        "confidence": 0.99
    },
    ...
]
Enter fullscreen mode Exit fullscreen mode

4. Convert to SRT
You need SRT (SubRip subtitle) format to store subtitles. Here's a snippet of an example .srt file:

0
00:00:08,576 --> 00:00:11,711
it's important for you to know how to mix your own colors to make your color
...
Enter fullscreen mode Exit fullscreen mode

Then transcription should be broken into sections.

  1. When there is a silence between two words we consider it an endpoint, and a section. In other words, the user is done talking and someone (same or different person) will continue talking later.
  2. We should contain only certain number of words in the section to avoid crowding the screen.

Implement these two logics:

def second_to_timecode(x: float) -> str:
    hour, x = divmod(x, 3600)
    minute, x = divmod(x, 60)
    second, x = divmod(x, 1)
    millisecond = int(x * 1000.)

    return '%.2d:%.2d:%.2d,%.3d' % (hour, minute, second, millisecond)

def to_srt(
        words: Sequence[pvleopard.Leopard.Word],
        endpoint_sec: float = 1.,
        length_limit: Optional[int] = 16) -> str:
    def _helper(end: int) -> None:
        lines.append("%d" % section)
        lines.append(
            "%s --> %s" %
            (
                second_to_timecode(words[start].start_sec),
                second_to_timecode(words[end].end_sec)
            )
        )
        lines.append(' '.join(x.word for x in words[start:(end + 1)]))
        lines.append('')

    lines = list()
    section = 0
    start = 0
    for k in range(1, len(words)):
        if ((words[k].start_sec - words[k - 1].end_sec) >= endpoint_sec) or \
                (length_limit is not None and (k - start) >= length_limit):
            _helper(k - 1)
            start = k
            section += 1
    _helper(len(words) - 1)

    return '\n'.join(lines)
Enter fullscreen mode Exit fullscreen mode

5. Save the SRT File
Last but not least, save the file

with open(subtitle_path, 'w') as f:
    f.write(to_srt(words))
Enter fullscreen mode Exit fullscreen mode

Voila!

You can see the full article here: https://picovoice.ai/blog/how-to-create-subtitles-for-any-video-with-python/

Top comments (1)

Collapse
 
wardah profile image
wardah

This code gives error for words: Sequence[pvleopard.Leopard.Word],

Sequence is not identified. What is it? what library we need to add for it