Let's get straight to the point,
- Open a random YouTube video, for example this one about me ;-) or any video with subtitles (automatic subtitles are fine too!)
- Open up the developer console (in Chrome: F12)
- Paste the 15-line script, and you'll hear everything in Italian!
By the way, if you want another language, open up the developer console and just type lang = 'en'
and the script will update the language automatically!
lang = 'en' // English
lang = 'es' // Spanish
lang = 'fr' // French
lang = 'de' // German
lang = 'it' // Italian
lang = 'pt' // Portuguese
lang = 'ru' // Russian
lang = 'zh' // Chinese
lang = 'ja' // Japanese
lang = 'ko' // Korean
lang = 'ar' // Arabic
lang = 'nl' // Dutch, my native language
You can also switch to other video's, it will keep working.
How does this short script actually work?
Thank you for your interest! It's a very simple script, that consists of four parts.
- Variables declaration and initialization. I'm using global variables to keep track of the current subtitle, language and subtitles.
- The getSubs function, which retrieves the subtitles in the desired language.
- The speak function, which speaks the right text at the right time using the Speech Synthesis API.
- We will execute the speak function every 0,05 seconds
let lastIndex = -1, lang = currentLang = "nl", ...
async function getSubs(langCode) {
...
}
const speak = async () => {
...
}
setInterval(speak, 50); // every 0,05 sec
Step 1: Retrieving the subtitles
If you type await getSubs('de')
in the console (assuming you pasted the 15-line code above), you will see an array with all subtitles in german! You can ask for any language, because the YouTube API just translates everything for you, cool he! We don't even have to translate it.
Let's break down the code into smaller parts to better understand each step:
Retrieve the caption tracks object (ct):
let ct = JSON.parse((await (await fetch(window.location.href)).text()).split('ytInitialPlayerResponse = ')[1].split(';var')[0]).captions.playerCaptionsTracklistRenderer.captionTracks;
If you are on a YouTube page in the dev console, type ytInitialPlayerResponse
and big chance you'll find an object with lots of information (including subtitles, click "captions") about this video!
I decided to retrieve this variable directly from the video page's HTML and extract the ytInitialPlayerResponse from it. Why? Because, if you watch another video, the global variable ytInitialPlayerResponse
does not get updated automatically. You cannot rely on it, you have to fetch the new video page and extract the object from it.
Define a helper function (findCaptionUrl):
let findCaptionUrl = x => ct.find(y => y.vssId.indexOf(x) === 0)?.baseUrl;
This helper function searches in the caption tracks and returns the base URL of the right caption track.
The vssId
looks very similar to the languageCode as we will see in the following line.
Build the subtitles URL (url):
let firstChoice = findCaptionUrl("." + langCode);
let url = firstChoice ? firstChoice + "&fmt=json3" : (findCaptionUrl(".") || findCaptionUrl("a." + langCode) || ct[0].baseUrl) + "&fmt=json3&tlang=" + langCode;
In order of preference we would like to retrieve:
- a written subtitle in our language (our
firstChoice
) - any written subtitle (that we translate to our language)
- an automatic subtitle in our language
- any subtitle (that we translate to our language)
Note that the URL also includes the format (json3) and translate language (tlang) query parameters. The Youtube API does this for us :-)
You can skip this detail: If we find an automatic subtitle in our language (3.), we will mute the voice-over in this case automatically, because the video is already in the preferred language. This happens by coincidence, because if you translate (tlang=) to its own language it returns blank subtitles :-)
Fetch and process the subtitles:
return (await (await fetch(url)).json()).events.map(x => ({...x, text: x.segs?.map(x => x.utf8)?.join(" ")?.replace(/
/g,' ')?.replace(/♪|'|"|.{2,}|<[sS]*?>|{[sS]*?}|[[sS]*?]/g,'')?.trim() || ''}));
This line
- fetches the subtitles from the previously constructed URL
- parses them as JSON
- processes the events to extract the text of each subtitle.
- It also cleans the text by removing unnecessary characters and formatting.
Let's break down the code into smaller parts to better understand each step:
Fetch the subtitles and convert them to JSON:
await (await fetch(url)).json()
This part of the line fetches the subtitles from the provided URL and converts the response into a JSON object.
Process the events:
.events.map(x => ...)
This part of the line maps over the events array of the JSON object to process each event.
Create a new object for each event:
({...x, text: ...})
For each event, a new object is created that includes all the original properties of the event (...x) and a new text property that will contain the cleaned subtitle text.
Extract and clean the subtitle text:
x.segs?.map(x => x.utf8)?.join(" ")?.replace(/
/g,' ')?.replace(/♪|'|"|.{2,}|<[sS]*?>|{[sS]*?}|[[sS]*?]/g,'')?.trim() || ''
This part of the line extracts the subtitle text from the event's segs property and performs the following operations:
- Maps over the segs array and retrieves the utf8 property of each segment.
- Joins the segments into a single string with spaces between them.
- Replaces newline characters with spaces.
- Removes special characters, quotes, multiple periods, and any content within angle brackets, curly braces, or square brackets.
- Trims whitespace from the beginning and end of the string. PS: If any of these operations fail (e.g., because the segs property is undefined), an empty string is returned (|| '').
Step 2: Analyze the speak Function
Now that we have the text, we want to let it be spoken! The speak function is an asynchronous function that manages dubbing the video. Its goal is to:
- find & speak the current subtitle
- synchronize the video with voice
- check if the video's URL or language has changed
Let's break down the code into smaller parts to better understand each step:
Update the subtitles if the video URL or language has changed:
if (location.href !== currentUrl || currentLang !== lang) (currentUrl = location.href) && (currentLang = lang) && (subs = await getSubs(lang));
Find the current subtitle based on the video's playback time:
const currentIndex = subs.findIndex(x => x.text && x.tStartMs <= 1000 * vid.currentTime && x.tStartMs + x.dDurationMs >= 1000 * vid.currentTime);
Return early if the script can't find a subtitle (the current subtitle index is -1) or we're still speaking the same subtitle (it's the same as the last subtitle index):
if ([-1, lastIndex].includes(currentIndex)) return;
If the previous subtitle is still being spoken, we will pause the video as long as the voice is defined.
if (voice) return vid.pause();
If there is no voice from previous subtitle, we can resume video playback and create a new SpeechSynthesisUtterance with the current subtitle:
vid.play();
voice = new SpeechSynthesisUtterance(subs[(lastIndex = currentIndex)].text);
Set the language and event listeners for the SpeechSynthesisUtterance and adjust the video volume:
voice.lang = lang;
voice.onend = () => (vid.volume = baseVolume || 1) && (voice = null);
vid.volume = 0.1;
Start the speech synthesis:
speechSynthesis.speak(voice);
Conclusion:
In this tutorial, we explored the getSubs (get subtitles) and speak function, which is responsible for synchronizing the video playback and the Speech Synthesis API to dub YouTube videos. You can see how simple and short a script can be! I hope it inspires you!
Note: The quality of the dubbing may vary depending on the available voices for the Speech Synthesis API in your browser and the quality of the subtitles.
Top comments (0)