DEV Community

Cover image for Prompt to Video with GPT-4 and React.js - Automatic video generation
Adrian Guery for Kezios

Posted on • Updated on • Originally published at kezios.fr

Prompt to Video with GPT-4 and React.js - Automatic video generation

Introduction

I always strive to create engaging and informative content. I'm convinced that one of the most effective ways to teach coding is through video tutorials. However, creating high-quality video tutorials from scratch can be time-consuming and labor-intensive. That's why I've developed a program that leverages the power of GPT-4 and Remotion (framework for creating videos programmatically using React.js) to automatically generate video tutorials from just a prompt.

Example of output video generated by a GPT prompt :

React countdown timer video


In this article, I will discuss how my program works and how I fine-tuned GPT-3 to understand the expected JSON format, allowing it to generate detailed and accurate JSON inputs for any coding tutorial I want to create a video with and how I did it also with GPT-4.

Creating the Program with Remotion

JsonToVideoRemotion

Remotion is a powerful library that allows developers to create videos programmatically using React. By feeding a JSON input (which will be created by my GPT4 fine tune model) to my program, I can generate a video tutorial explaining how the code works step-by-step.

The JSON input I use for this program has a specific structure, which includes the code snippet, the language, and an array of highlight explanations. The highlight explanations contain the code segment and a brief explanation of what that particular code does.

For example, here is a JSON input for a simple React countdown timer:

{
    code: "import React, { useState, useEffect } from 'react';\nfunction CountdownTimer() {\n const [seconds, setSeconds] = useState(60);\n\n useEffect(() => {\n const interval = setInterval(() => {\n setSeconds(seconds - 1);\n }, 1000);\n return () => clearInterval(interval);\n }, [seconds]);\n\n return (\n <div>\n {seconds} seconds remaining\n </div>\n );\n}",
    language: "javascript",
    highlightExplanations: [
        {
            highlight: "import React, { useState, useEffect } from 'react';",
            explanation:
                "This imports the required dependencies from the React library.",
        },
        {
            highlight: "const [seconds, setSeconds] = useState(60);",
            explanation:
                "This uses the `useState` hook to initialize the `seconds` state variable with a value of 60 and create a function to update its value, `setSeconds`.",
        },
        {
            highlight: "const interval = setInterval(() => {",
            explanation:
                "This sets up an interval that will run the anonymous function inside every 1000 milliseconds (1 second).",
        },
        {
            highlight: "setSeconds(seconds - 1);",
            explanation:
                "This decreases the value of the `seconds` state variable by 1 every time the interval runs.",
        },
        {
            highlight: "return () => clearInterval(interval);",
            explanation:
                "This is the cleanup function for the `useEffect` hook. It clears the interval when the component unmounts to prevent memory leaks.",
        },
        {
            highlight: "{seconds} seconds remaining",
            explanation:
                "This displays the current value of the `seconds` state variable in the component's render output.",
        },
    ],
};
Enter fullscreen mode Exit fullscreen mode

Upon receiving this JSON input, my program processes it and generates a video tutorial using Remotion. The video tutorial explains each part of the code snippet, as specified in the highlight explanations array.

I wrote a more advanced article about this part :

Fine-tuning GPT-3 for JSON Input Generation

PromptGPT3FineTunedVideo

To make the process of creating video tutorials even more efficient, I fine-tuned GPT-3 to understand the specific JSON format used by my program. By training GPT-3 with numerous examples of correctly formatted JSON inputs, the model can now generate JSON inputs for any coding tutorial I want to create a video for.

This means that I can simply ask GPT-3 with to generate a JSON input that my Remotion program can use to create a video tutorial.

For example, I might ask GPT-3 to create a JSON input for a tutorial on how to create a simple countdown timer in React. GPT-3 would then generate a JSON input like the one shown earlier, with the code snippet and complete with highlight explanations for each part of the code snippet.

I write an article explaining how I fine tune my GPT-3 model step by step :

Utilizing GPT-4 for JSON Input Generation by Providing Five Prompt-Answer Examples

After gaining access to the GPT-4 API, I decided to experiment with GPT-4 as a JSON generator. The most significant advantage of GPT-4 is its extensive context capability, allowing for up to 50k tokens of context. This means that providing GPT-4 with five examples of prompt-answer pairs is possible, leading to improved results.

As a result, the outcomes were even better than those achieved with the fine-tuned GPT-3 model, as there were fewer instances of incorrect JSON formats.

Example : button which disappears when it's clicked

We asked our fine-tuned GPT3 (or GPT-4) model to :
Create a tutorial which teach how to create a button which disappears when it's clicked

It answers with this output :

 {
    code: "<button id='disappearing-button'>Click me!</button>\n<style>\n#disappearing-button {\n display: block;\n}\n#disappearing-button:active {\n display: none;\n}\n</style>\n<script>\n document.getElementById('disappearing-button').addEventListener('click', function() {\n this.style.display = 'none';\n });\n</script>",
    language: "html",
    highlightExplanations: [
        {
            highlight: "display: block;",
            explanation:
                "This sets the display property of the button to 'block', which makes it visible on the page.",
        },
        {
            highlight: "display: none;",
            explanation:
                "This sets the display property of the button to 'none', which makes it hidden on the page.",
        },
        {
            highlight: "document.getElementById('disappearing-button')",
            explanation:
                "This gets a reference to the button element on the page by its ID.",
        },
        {
            highlight: "addEventListener('click', function()",
            explanation:
                "This sets up a click event listener on the button element, which means that the anonymous function inside will be executed when the button is clicked.",
        },
        {
            highlight: "this.style.display = 'none';",
            explanation:
                "Inside the event listener function, 'this' refers to the element that the event listener is attached to (in this case, the button element). Setting the 'display' property of this element to 'none' makes it hidden on the page.",
        },
    ],
};
Enter fullscreen mode Exit fullscreen mode

Which, once interpreted by Remotion will output this video tutorial :

button which disappears video

Conclusion

By combining the power of GPT-4 (or fine tuned GPT-3) and Remotion, I have created a program that can automatically generate video tutorials for a wide range of coding topics. That shows how a NLP model can produce visual content with another program.

As AI and machine learning continue to advance, the possibilities for automating content creation in various fields are endless. This program is just one example of how we can harness AI to revolutionize the way we create content.

Top comments (0)