You can use automatic captioning and YouTube Studio to add subtitles to your videos. However, it might not be accurate enough or you may just want to build your own tool.
Here's how:
1.Extract Audio
First, extract the audio from your video content. You can accomplish this using a tool such as FFmpeg.
2. Install Speech Recognition SDK
Install Leopard STT Python SDK:
pip install "pvleopard>=1.1"
Log in to (Sign up for) Picovoice Console. It is free. Grab your AccessKey and initialize Leopard:
import pvleopard
leopard = pvleopard.create(access_key=access_key)
3.Transcribe Audio to Text
transcript, words = leopard.process_file(audio_path)
Leopard returns the transcription as an str
with word-level metadata including timestamps
and confidence
.
[
{
"word": "it's",
"start_sec": 8.58,
"end_sec": 8.70,
"confidence": 0.78
},
{
"word": "important",
"start_sec": 8.77,
"end_sec": 9.12,
"confidence": 0.99
},
...
]
4. Convert to SRT
You need SRT (SubRip subtitle) format to store subtitles. Here's a snippet of an example .srt file:
0
00:00:08,576 --> 00:00:11,711
it's important for you to know how to mix your own colors to make your color
...
Then transcription should be broken into sections.
- When there is a silence between two words we consider it an endpoint, and a section. In other words, the user is done talking and someone (same or different person) will continue talking later.
- We should contain only certain number of words in the section to avoid crowding the screen.
Implement these two logics:
def second_to_timecode(x: float) -> str:
hour, x = divmod(x, 3600)
minute, x = divmod(x, 60)
second, x = divmod(x, 1)
millisecond = int(x * 1000.)
return '%.2d:%.2d:%.2d,%.3d' % (hour, minute, second, millisecond)
def to_srt(
words: Sequence[pvleopard.Leopard.Word],
endpoint_sec: float = 1.,
length_limit: Optional[int] = 16) -> str:
def _helper(end: int) -> None:
lines.append("%d" % section)
lines.append(
"%s --> %s" %
(
second_to_timecode(words[start].start_sec),
second_to_timecode(words[end].end_sec)
)
)
lines.append(' '.join(x.word for x in words[start:(end + 1)]))
lines.append('')
lines = list()
section = 0
start = 0
for k in range(1, len(words)):
if ((words[k].start_sec - words[k - 1].end_sec) >= endpoint_sec) or \
(length_limit is not None and (k - start) >= length_limit):
_helper(k - 1)
start = k
section += 1
_helper(len(words) - 1)
return '\n'.join(lines)
5. Save the SRT File
Last but not least, save the file
with open(subtitle_path, 'w') as f:
f.write(to_srt(words))
Voila!
You can see the full article here: https://picovoice.ai/blog/how-to-create-subtitles-for-any-video-with-python/
Top comments (1)
This code gives error for words: Sequence[pvleopard.Leopard.Word],
Sequence is not identified. What is it? what library we need to add for it