DEV Community

Cover image for Transcribe YouTube Videos with Node.js
Kevin Lewis for Deepgram

Posted on • Originally published at developers.deepgram.com

Transcribe YouTube Videos with Node.js

In this blog post we will be creating transcripts for YouTube videos using Deepgram's Speech Recognition API. First, we will download videos and convert them to mp3 audio files. Then, we will use Deepgram to generate a transcript. Finally, we will store the transcript in a text file and delete the media file.

We need a sample video, so I am using a Shang-Chi and The Legend of The Ten Rings teaser trailer - if that is a spoiler for you please go ahead and grab another video link.

Before We Start

You will need:

  • Node.js installed on your machine - download it here.
  • A Deepgram project API key - get one here.
  • A YouTube Video ID which is part of the URL of a video. The one we will be using is ir-mWUYH_uo.

Create a new directory and navigate to it with your terminal. Run npm init -y to create a package.json file and then install the following packages:

npm install @deepgram/sdk ffmpeg-static youtube-mp3-downloader
Enter fullscreen mode Exit fullscreen mode

Create an index.js file, and open it in your code editor.

Preparing Dependencies

At the top of your file require these four packages:

const fs = require('fs')
const YoutubeMp3Downloader = require('youtube-mp3-downloader')
const { Deepgram } = require('@deepgram/sdk')
const ffmpeg = require('ffmpeg-static')
Enter fullscreen mode Exit fullscreen mode

fs is the built-in file system module for Node.js. It is used to read and write files which we will be doing a few times throughout this post. ffmpeg-static includes a version of ffmpeg in our node_modules directory, and requiring it returns the file path.

Initialize the Deepgram and YouTubeMp3Downloader clients:

const deepgram = new Deepgram('YOUR DEEPGRAM KEY')
const YD = new YoutubeMp3Downloader({
  ffmpegPath: ffmpeg,
  outputPath: './',
  youtubeVideoQuality: 'highestaudio',
})
Enter fullscreen mode Exit fullscreen mode

Download Video and Convert to MP3

Under the hood, the youtube-mp3-downloader package will download the video and convert it with ffmpeg on our behalf. While it is doing this it triggers several events - we are going to use the progress event so we know how far through the download we are, and finished which indicates we can move on.

YD.download('ir-mWUYH_uo')

YD.on('progress', (data) => {
  console.log(data.progress.percentage + '% downloaded')
})

YD.on('finished', async (err, video) => {
  const videoFileName = video.file
  console.log(`Downloaded ${videoFileName}`)

  // Continue on to get transcript here
})
Enter fullscreen mode Exit fullscreen mode

Save and run the file with node index.js and you should see the file progress in your terminal and then have the file available in your file directory.

A terminal showing various percentages downloaded ending with 100%. The final log states the final filename.

Get Transcript from Deepgram

Where the comment is above, prepare and create a Deepgram transcription request:

const file = {
  buffer: fs.readFileSync(videoFileName),
  mimetype: 'audio/mp3',
}
const options = {
  punctuate: true,
}

const result = await deepgram.transcription
  .preRecorded(file, options)
  .catch((e) => console.log(e))
console.log(result)
Enter fullscreen mode Exit fullscreen mode

There are lots of options which can make your transcript more useful including diarization which recognizes different speakers, a profanity filter which replaces profanity with nearby terms, and punctuation. We are using punctuation in this tutorial to show you how setting options works.

Rerun your code and you should see a JSON object printed in your terminal.

A terminal showing the file being downloaded, and then an object containing data from Deepgram. Within the object is a results object with a channels array. Further content is ommitted from the screenshot as it is nested too far.

Saving Transcript and Deleting Media

There is a lot of data that comes back from Deepgram, but all we want is the transcript which, with the options we provided, is a single string of text. Add the following line to access just the transcript:

const transcript = result.results.channels[0].alternatives[0].transcript
Enter fullscreen mode Exit fullscreen mode

Now we have the string, we can create a text file with it:

fs.writeFileSync(
  `${videoFileName}.txt`,
  transcript,
  () => `Wrote ${videoFileName}.txt`
)
Enter fullscreen mode Exit fullscreen mode

Then, if desired, delete the mp3 file:

fs.unlinkSync(videoFileName)
Enter fullscreen mode Exit fullscreen mode

Summary

Transcribing YouTube videos has never been easier thanks to Deepgram's Speech Recognition API and the Deepgram Node SDK. Your final code should look like this:

const fs = require('fs')
const YoutubeMp3Downloader = require('youtube-mp3-downloader')
const { Deepgram } = require('@deepgram/sdk')
const ffmpeg = require('ffmpeg-static')

const deepgram = new Deepgram('YOUR DEEPGRAM KEY')
const YD = new YoutubeMp3Downloader({
  ffmpegPath: ffmpeg,
  outputPath: './',
  youtubeVideoQuality: 'highestaudio',
})

YD.download('ir-mWUYH_uo')

YD.on('progress', (data) => {
  console.log(data.progress.percentage + '% downloaded')
})

YD.on('finished', async (err, video) => {
  const videoFileName = video.file
  console.log(`Downloaded ${videoFileName}`)

  const file = {
    buffer: fs.readFileSync(videoFileName),
    mimetype: 'audio/mp3',
  }
  const options = {
    punctuate: true,
  }

  const result = await deepgram.transcription
    .preRecorded(file, options)
    .catch((e) => console.log(e))
  const transcript = result.results.channels[0].alternatives[0].transcript

  fs.writeFileSync(
    `${videoFileName}.txt`,
    transcript,
    () => `Wrote ${videoFileName}.txt`
  )
  fs.unlinkSync(videoFileName)
})
Enter fullscreen mode Exit fullscreen mode

Check out the other options supported by the Deepgram Node SDK and if you have any questions feel free to reach out to us on Twitter (we are @DeepgramDevs).

Top comments (0)