Building a video editor completely on the frontend: FFMpeg, WebCodecs, WebAssembly and React.

#webdev #react #frontend #javascript

Hey there,

In this post, I'm excited to share the work I've undertaken over the past year, addressing the challenges I encountered and highlighting the architectural decisions that shaped my passion project: keyframes.studio. It's an online video editor with a unique focus on creating content for social media platforms.

As the name Keyframes Studio implies, keyframes play a significant role in this tool. They allow you to easily zoom and pan the camera, simplifying the process of transforming horizontal videos into vertical ones.

This particular feature was missing from the other similar online video editors so I decided to build it myself. A fun, but very very painful journey just began.

Now, let's dive into the technicalities. What does a video editor require?

An editor interface
A preview renderer
A final video renderer

Let's explore each of these in detail.

1. An editor interface

The implementation of the editor interface was likely the simplest among the three components. I utilized Next.js and Tailwind to construct the backbone of my video editor. The layout is quite straightforward, comprising upload forms and sidebar buttons that switch the menu. However, things got more complex when I started developing the track area.

The track items needed to be draggable, droppable, and resizable (which is how users trim each clip). Simultaneously, they had to adhere to their respective rows and display snap guidelines (similar to Figma's helper line that assists with item alignment).

Although I experimented with several libraries, none fully met my requirements due to limited documentation or missing functionalities. That is until I discovered react-moveable. This library simplified my work significantly, providing various built-in features while still offering flexibility to listen to any event and customize it.

Global state management also forms an essential part of the editor. To ensure smooth information sharing across components and minimize re-rendering for enhanced performance, I chose zustand as the global state management solution.

2. A preview renderer.

Once several items are added to the track area, naturally, you'd want to preview how your final video would look. For this, I employed the trusty Canvas. Upon page mount, I initiated a requestAnimationFrame with a render loop that manages the drawing. Each loop examines the current timestamp's track items (e.g., 00:02:234), draws each one at the suitable position on the canvas, then sets the timestamp slightly forward, and the cycle repeats.

I also employed react-moveable for the preview, allowing users to modify the position, size, rotation of the overlays. Since the actual image rendered on the canvas isn't directly clickable, I added an invisible div drag area over the image for user interaction.

The metadata associated to each track item is stored in the zustand store.

Another big challenge was implementing the texts with animations. Canvas does not support multiline texts by default and the libraries available for this was lacking customisation and some features so I had to go ahead and do it myself. With canvas a good rule of thumb is that you have to do everything for yourself. You want to animate some words? Good luck calculating the x, y positions for every millisecond and manually rendering the text accordingly.

3. Final video renderer

That should be easy, call the HTMLCanvasElement: captureStream() method with a mediarecorder and you should be good to go. Said the 1 year younger me. If only he knew...

There are two problems with this approach.

The captureStream function only records at about 30 FPS. You can add 60 as a parameter but that only sets the maximum FPS, which in my case was never hit.
Slower computers will have drastically worse outputs which I did not find acceptable.

So I began experimenting with FFMpeg as FFMpeg.wasm allows you to run it in the browser directly. Those of you not familiar with FFMpeg: this is the swiss-army-knife of video editing, a very powerful CLI tool to convert, and edit videos.

I wanted to create the whole video with and ffmpeg command, however that turned out to be vastly more complicated than expected. The main reason being that ffmpeg does not support keyframing the way I want it. It has the zoompan method which is a tiny bit different than what I needed, and the crop method, which unfortunately does not allow dynamic widths and heights to be cropped.

With a TON of trial and error the final method of rendering became this:

The user clicks export
FFmpeg.wasm converts all videos to .mp4 with an increased GOP size. (This vastly boosts the seeking speed)
FFmpeg.wasm gets and concats the sound of all track items.
Create a new VideoEncoder via the WebCodecs api. As an output parameter I am using webm-muxer.
Set the timestamp to 0, initiate seeking of each video item to the current timestamp, wait for the seeking to complete, create a new Videoframe of the canvas element and encode it with the VideoEncoder. Move 16ms forward and repeat until the end of the video.
Convert the webm video output from step 5 to mp4 and add the audio from step 3 with ffmpeg.wasm.
Video is ready to be downloaded.

This guarantees the final video to be a smooth with 60 FPS guaranteed. Of course there are quite a few drawbacks here:

ffmpeg.wasm max size limit is 2GB. Bigger files than that will throw an error.
Export speed varies between computers. Fast computers will finish exporting faster.
Tab cannot be closed while exporting

But there are also some advantages:

User do not have to upload their files.
Users can potentially use the product offline
I do not have to pay for expensive infrastructure => my prices can be lower.

Conclusion

I would be happy to answer any of your questions or hear your feedback. This architecture is nowhere near perfect, but right now it is functioning very well. The product has many more interesting aspects, e.g. I am using Google Video Intelligence API to generate keyframes using AI, and also use a different google service to generate subtitles.

I really hope to see more support, love, and performance enhancement for wasm in the future as I think it can be very powerful and open up a lot of possibilities of what you can achieve completely in the browser.