DEV Community

Cover image for How to Create Real-time AI Video Avatar in 7 Minutes
Dalu46 for Simli

Posted on • Edited on

How to Create Real-time AI Video Avatar in 7 Minutes

What if you could turn dull, static text and audio-based content into exciting videos with the help of AI? With AI video avatar generators, you can easily create high-quality videos that grab your audience's attention starting from simple text or audio.

These AI video generators can serve several purposes, from being deployed as a customer support agent in your application to give more engaging support and enhance customer satisfaction to being used as an educational tool to engage students in interactive learning environments. They can also be used to create virtual assistants that guide users through getting started with a product or tool without needing to go through the documentation.

This tutorial will guide you in setting up and implementing real-time AI video avatars using Simli, an AI video avatar generator. Simli provides developers with a speech-to-video API to create Lipsynced AI avatars with lifelike, animated characters, realistic head movements, and synchronized speech.

Following this guide, you will learn how to quickly create a video avatar from voice inputs, ready to be deployed in interactive projects. So, let's get started right away!

The complete source code for the project is available on GitHub.

Prerequisites

You should have:

  • A basic understanding of JavaScript and React.
  • Node and Node Package Manager (NPM) installed on your computer.

Before setting up the API environment and then moving on to creating a real-time AI video avatar using the Simli API, let's briefly look at the steps needed to create an AI video avatar with Simli.

Steps to Create an AI Video Avatar with Simli:

  1. Obtain the API key
  2. Choose a face ID
  3. Initialize the Simli client
  4. Call the simliClient.start() function to set the WebRTC connection
  5. Stream audio using sendAudioData()

Set Up Your API Environment in Minutes

Start by signing up on Simli to retrieve your API key. For a quick sign-in, you can choose Google.
Once you’ve successfully created an account, you will be redirected to the user profile dashboard, where you can generate your API key and track your API usage.

Simli Dashboard

Click the icon above to copy your API key and store it securely. After retrieving your API key, select an avatar to display on the frontend.

Choose Your AI Avatar

Simli provides sample AI avatars that can be accessed through its available faces, with new avatars being added constantly.

Here are a few of the available faces:

Simli available faces

To get the ID for each face, copy the random text after the name. For example, the ID for Jenna will be tmp9i8bbq7c.

If you don’t want to use any available avatars, Simli has a create avatar tool that lets you create custom avatars simply by uploading images. However, this tutorial will use an existing avatar.

Now that you have the face ID and the Simli API key, let’s create a Next.js app.

Create a Next.js App

To bootstrap a Next.js application, open your terminal, cd into the directory where you would like to create the application, and run this command:

npx create-next-app@latest simli-demo
Enter fullscreen mode Exit fullscreen mode

This command will prompt a few questions about configuring the Next.js application. Here’s what you should respond to each question:

next.js configuration questions

Select the response for each question as shown above by pressing enter.

Installing Dependencies

Next, install the simli-client and AudioContext packages by running this code:

npm install simli-client standardized-audio-context
Enter fullscreen mode Exit fullscreen mode

The SimliClient, also known as Simli’s WebRTC frontend client, is a tool to integrate real-time video and audio streaming capabilities into web applications using WebRTC. This will enable you to avoid the manual WebRTC setup.

The AudioContext is used to downsample the audio and convert it into chunks that the SimliClient can process.

Initialize the SimliClient in Your Project

In your Next.js application, navigate to the page.js file and paste the following code:

// src/app//page.js
...
// Declare video and audio ref 
... 

import { useRef, useEffect } from 'react';

function Home() {
  const videoRef = useRef(null);
  const audioRef = useRef(null);

return (
  <div>
    <video ref={videoRef} autoPlay playsInline></video>
    <audio ref={audioRef} autoPlay></audio>
  </div>
);
...
Enter fullscreen mode Exit fullscreen mode

In the code above, a videoRef and audioRef was created using the useRef hook to access the <video> and <audio> HTML elements in the component. The SimliClient SDK uses videoRef and audioRef to attach live WebRTC video and audio streams to these HTML elements. The <video> and <audio> elements will be used to render the video and audio data from the remote streams on the client side.

The next step is to configure SimliClient and pass in the video and audio ref. To do so, paste the following code inside Interview.js:

// src/app//page.js
...
// configure the simli client
...  

import { SimliClient } from 'simli-client';

const simliClient = new SimliClient();

const simliConfig = {
  apiKey: "your api key",
  faceID: "tmp9i8bbq7c",
  handleSilence: true,
  maxSessionLength: 3600,
  maxIdleTime: 600,
  videoRef: videoRef,
  audioRef: audioRef,
};
...
Enter fullscreen mode Exit fullscreen mode

This block of code creates a new instance of the SimliClient and a simliConfig object. Let’s break down each part of the simliConfig object:

  • apiKey: This is a unique key when creating an account with Simli.
  • faceID: Represents the avatar face ID that will be rendered in the video stream. Simli provides different avatars; you can choose one using its face ID.
  • handleSilence: This boolean indicates whether the client should handle silent moments in the audio stream (e.g., muting or pausing the video if no audio is detected).
  • maxSessionLength: Sets the maximum session length (in seconds). Here, it's set to 1 hour (3600 seconds), limiting the duration of any single connection session.
  • maxIdleTime: Sets the maximum idle time (in seconds). The session will disconnect after 600 seconds (10 minutes) without activity.
  • videoRef and audioRef: These are references to the video and audio elements where the media streams will be displayed in the browser. SimliClient can connect the WebRTC streams directly to these elements by passing these refs.

Start Real-time Streaming with AI Video Avatar

Once you have successfully configured SimliClient, the next step is establishing the webRTC connection.

But before that, you need to create a function that will reduce the audio to 16 kHz and break it into smaller pulse-code modulation (PCM) chunks. This guide will use a prerecorded mp3 audio that will be sent to the SimliClient. You can download and use any audio of your choice.

Paste the following code inside page.js file to create the downsampleAndChunkAudio function:

// src/app//page.js
...
// Downsample the audio to PCM chunks
... 

const downsampleAndChunkAudio = async (audioUrl, chunkSizeInMs = 100) => {
  // Create an AudioContext with a target sample rate of 16kHz
  const audioContext = new AudioContext({ sampleRate: 16000 });
  // Fetch and decode audio file
  const response = await fetch(audioUrl);
  const arrayBuffer = await response.arrayBuffer();
  const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
  // Extract PCM data from audio buffer
  const rawPCM = audioBuffer.getChannelData(0); // assuming mono audio for simplicity
  // Calculate chunk size in samples (16-bit PCM)
  const chunkSizeInSamples = (chunkSizeInMs / 1000) * 16000;
  const pcmChunks = [];
  // Loop through the raw PCM data and create chunks
  for (let i = 0; i < rawPCM.length; i += chunkSizeInSamples) {
    const chunk = rawPCM.subarray(i, i + chunkSizeInSamples);
    // Convert each chunk to Int16Array PCM data
    const int16Chunk = new Int16Array(chunk.length);
    for (let j = 0; j < chunk.length; j++) {
      int16Chunk[j] = Math.max(-32768, Math.min(32767, chunk[j] * 32768));
    }
    pcmChunks.push(int16Chunk);
  }
  return pcmChunks;
};
Enter fullscreen mode Exit fullscreen mode

This downsampleAndChunkAudio function takes audio as an argument and processes the audio file by downsampling it to 16 kHz and breaking it into smaller PCM chunks. This format is required for audio to be sent to the SimliClient.

Next, you have to initialize SimliClinet and establish the WebRTC connection. To do so, paste the following code inside page.js file:

// src/app//page.js
...
// Initialize simli client
... 

async function initializeClient() {
   try {
     simliClient.Initialize(simliConfig);
     await simliClient.start();
     // setIsInitialized(true);
     // Send audio data in chunks
     const pcmChunks = await downsampleAndChunkAudio(audioUrl);
     const interval = setInterval(() => {
       const chunk = pcmChunks.shift();
       // if (isInitialized && chunk) {
         chunk && simliClient.sendAudioData(chunk);
       // }
       if (!pcmChunks.length) clearInterval(interval);
       console.log("PCM ", chunk);
     }, 120);
   } catch(error){
     alert(error);
   }
 }
Enter fullscreen mode Exit fullscreen mode

The initializeClient function initializes the SimliClient with the simliConfig object that was earlier declared. It then calls the downsampleAndChunkAudio function to break the audio into chunks of type PCM16 before sending it to the Simli client.

Note: The audio data should be of PCM16 type and have a sample rate 16KHz.

PCM16 is a standard audio format ideal for voice processing. When you send this audio format to Simli's API, it helps maintain synchronization between the audio and the avatar's lip movements. This enhances the viewer experience, as it mimics natural speaking in real-time.

Render and Integrate the AI Avatar on the Frontend

Now that you have finished building the application, let’s render it on the browser. To do so, open your terminal and run this code:

npm run dev
Enter fullscreen mode Exit fullscreen mode

This command will start a local host server on http://localhost:3000.

Watch the application in action through this video.

You should checkout this GitHub repository to explore a hands-on example on how to integrate Simli's API for building interactive AI avatars.

Conclusion

This quick guide showed how to create a real-time AI video avatar using the Simli API. While this article covered only the basics—such as sending prerecorded audio to the Simli API—Simli offers capabilities that extend far beyond this scope.

To unlock Simli's full potential, you can enhance your AI video avatars by integrating additional tools like OpenAI for language models and Deepgram or Elevenlabs for converting text to speech. These tools work seamlessly with Simli to create more engaging and interactive video experiences.
Check this tutorial for a more advanced use case of Simli. Sign up on Simli today to get started!

Top comments (0)