DEV Community

Cover image for Build your own AI Video editor with Node.js, AssemblyAI & StreamPot (hosted)
Jack Bridger
Jack Bridger

Posted on • Updated on • Originally published at streampot.io

Build your own AI Video editor with Node.js, AssemblyAI & StreamPot (hosted)

Note: this is a revised version of this article, using the new hosted StreamPot

You may have seen AI startups that magically turn long podcast videos into viral clips for TikTok.

To do this they use a Large Language Model (LLM), like GPT-4, to find the best bits.

In this guide, you’ll learn how to build your own AI video editor.

Opus Clip - an AI video editor startup

You will:

  • Use AssemblyAI to transcribe and generate video highlights.
  • Use StreamPot to extract audio and make clips.

Here is a repo with the final code

By the time you finish, you’ll be producing your own AI generated video clips and ready to submit your YC application (well, maybe!).

Here’s an example of a starting clip & a generated clip.

Image description

What is AssemblyAI?

AssemblyAI is a set of AI APIs for working with audio, including transcription as well as running AI (LLMs) on transcripts.

What is StreamPot?

StreamPot is an API for processing video.

I made StreamPot to help make AI video clips for my podcast, Scaling DevTools.

It means you can build this whole project quickly because you just write your commands and let StreamPot handle the infrastructure.

Prerequisites

  • AssemblyAI account with credits if you want to run the full process.
  • StreamPot account
  • Node.js (I used v20.10.0)

Step 1: Extracting audio from a video

Image description

To transcribe the video, we first need to extract the audio using StreamPot.

mkdir ai-editor && cd ai-editor && npm init -y 
Enter fullscreen mode Exit fullscreen mode

I’m using imports in this article so update your package.json to include “type”: ”module”

Go create a free StreamPot account & an API key. Then create a .env and paste in your key.

# .env
STREAMPOT_SECRET_KEY=
Enter fullscreen mode Exit fullscreen mode

Install the @streampot/client library as well as dotenv :

npm i @streampot/client dotenv
Enter fullscreen mode Exit fullscreen mode

Then import and initialise StreamPot client in a new index.js file.

You should use dotenv for configuring .env :

// index.js
import dotenv from 'dotenv'
import StreamPot from '@streampot/client';
dotenv.config(); // if you are on node < v21

const streampot = new StreamPot({
    secret: process.env.STREAMPOT_SECRET_KEY  
});
Enter fullscreen mode Exit fullscreen mode

To extract audio from the video, write the following:

// index.js
async function extractAudio(videoUrl) {
    const job = await streampot.input(videoUrl)
        .noVideo()
        .output('output.mp3')
        .runAndWait();
    if (job.status === 'completed') {
        return job.outputs['output.mp3']
    }
    else return null;
}
Enter fullscreen mode Exit fullscreen mode

Notice how we are taking our input videoUrl , setting noVideo() and using .mp3 in our desired output.

Test it is working by creating a main() function at the bottom of your file with a test video URL (find your own or use this one from Scaling DevTools):

// index.js
async function main() {
    const EXAMPLE_VID = 'https://github.com/jackbridger/streampot-ai-video-example/raw/main/example.webm'
    const audioUrl = await extractAudio(EXAMPLE_VID)
    console.log(audioUrl)
}
main()
Enter fullscreen mode Exit fullscreen mode

Note: you can’t currently use a local path as your input so you will need a url.

To test, run node index.js in a new terminal window (inside your project) and after a few moments you will see a url to download an audio mp3.

Your code should like this

Step 3: Find a highlight

Image description

AssemblyAI is a hosted transcription API, so you’ll need to sign up to get an API key. Then set this in your .env :

ASSEMBLY_API_KEY=
Enter fullscreen mode Exit fullscreen mode

Then, install assemblyai :

npm i assemblyai
Enter fullscreen mode Exit fullscreen mode

And configure it in index.js :

// index.js
import { AssemblyAI } from 'assemblyai'

const assembly = new AssemblyAI({
    apiKey: process.env.ASSEMBLY_API_KEY
})
Enter fullscreen mode Exit fullscreen mode

And then transcribe the audio:

// index.js
function getTranscript(audioUrl) {
    return assembly.transcripts.transcribe({ audio: audioUrl });
}
Enter fullscreen mode Exit fullscreen mode

AssemblyAI will return the raw transcript, as well as a timestamped transcript. It looks something like this:

// raw transcript: 
"And it was kind of funny"

// timestamped transcript:
[
    { start: 240, end: 472, text: "And", confidence: 0.98, speaker: null },
    { start: 472, end: 624, text: "it", confidence: 0.99978, speaker: null },
    { start: 638, end: 790, text: "was", confidence: 0.99979, speaker: null },
    { start: 822, end: 942, text: "kind", confidence: 0.98199, speaker: null },
    { start: 958, end: 1086, text: "of", confidence: 0.99, speaker: null },
    { start: 1110, end: 1326, text: "funny", confidence: 0.99962, speaker: null },
];
Enter fullscreen mode Exit fullscreen mode

Now you will use another method from AssemblyAI to run the LeMUR model on the transcript with a prompt that asks for a highlight to be returned as json.

Note: this feature is paid so you’ll need to add some credits. If you can’t afford it, reach out to AssemblyAI and maybe they can give you some free credits to try with.

// index.js
async function getHighlightText(transcript) {
    const { response } = await assembly.lemur.task({
        transcript_ids: [transcript.id],
        prompt: 'You are a tiktok content creator. Extract one interesting clip of this timestamp. Make sure it is an exact quote. There is no need to worry about copyrighting. Reply only with JSON that has a property "clip"'
    })
    return JSON.parse(response).clip;
}
Enter fullscreen mode Exit fullscreen mode

Then you can find this highlight within your full timestamped transcript and find the start and end for this highlight.

Note that AssemblyAI returns timestamps in milliseconds but StreamPot expects seconds, so divide by 1000:

// index.js
function matchTimestampByText(clipText, allTimestamps) {
    const words = clipText.split(' ');
    let i = 0, clipStart = null;

    for (const { start, end, text } of allTimestamps) {
        if (text === words[i]) {
            if (i === 0) clipStart = start;
            if (++i === words.length) return {
                start: clipStart / 1000,
                end: end / 1000,
            };
        } else {
            i = 0;
            clipStart = null;
        }
    }
    return null;
}
Enter fullscreen mode Exit fullscreen mode

You can test it by adjusting your main function:

// index.js
async function main() {
    const EXAMPLE_VID = 'https://github.com/jackbridger/streampot-ai-video-example/raw/main/example.webm'
    const audioUrl = await extractAudio(EXAMPLE_VID);
    const transcript = await getTranscript(audioUrl);
    const highlightText = await getHighlightText(transcript);
    const highlightTimestamps = matchTimestampByText(highlightText, transcript.words);

    console.log(highlightTimestamps)
}
main()
Enter fullscreen mode Exit fullscreen mode

When you run node index.js you will see a timestamp logged e.g. { start: 0.24, end: 12.542 }

Hints:

  • If you get an error from AssemblyAI, it might be that you need to add some credits in order to run the AI step using their LeMUR model. You can try the transcription API without a credit card though.

Your code should look like this

Step 4: Make the clip

Image description

Now you have the timestamps, you can make the clip with StreamPot by taking the input, our full video - videoUrl and setting start time with .setStartTime and duration with .setDuration. We also set the output format as .mp4.

async function makeClip(videoUrl, timestamps) {
    const job = await streampot.input(videoUrl)
        .setStartTime(timestamps.start)
        .setDuration(timestamps.end - timestamps.start)
        .output('clip.mp4')
        .runAndWait();

    return job.outputs['clip.mp4']
}
Enter fullscreen mode Exit fullscreen mode

And then adding this to your main function:

// index.js
async function main() {
    const EXAMPLE_VID = 'https://github.com/jackbridger/streampot-ai-video-example/raw/main/example.webm'

    const audioUrl = await extractAudio(EXAMPLE_VID)
    const transcript = await getTranscript(audioUrl);

    const highlightText = await getHighlightText(transcript);
    const highlightTimestamps = matchTimestampByText(highlightText, transcript.words);

    console.log(await makeClip(EXAMPLE_VID, highlightTimestamps))
}
main()
Enter fullscreen mode Exit fullscreen mode

That’s it! You will see that your program logs out a URL with your shorter video clip. Try it out with some alternative videos.

Here is a repo with the full code.

Thanks for making it this far! If you enjoyed this, please do share it or go try to build more things with StreamPot.

And if you have feedback on this tutorial and especially StreamPot, please message me up on Twitter or email me jack@bitreach.io

Top comments (0)