DEV Community

Cover image for An Audio Player hook for your React App
Amnish Singh Arora
Amnish Singh Arora

Posted on • Updated on

An Audio Player hook for your React App

This was my 7th week of contributing to ChatCraft, and so far, I've successfully shipped various improvements to the existing system. Every week, I've tried to add new features to the app and since we are shipping code quickly, many time its not possible to handle each aspect of the issue perfectly and follow ups have to be filed.

A few weeks ago, I used OpenAI's text to speech API enabling ChatCraft to announce LLM responses as they were generated.

I tried to parse the responses into sentences, pass those chunks to tts api for audio generation, and ultimately play the generated audio clips in sequence.

I've already written about the entire process in another post. In this post, I'll discuss one of the shortcomings of that feature, and how I updated my AudioPlayer hook to manage and orchestrate the ordered playback of audio chunks throughout the application.

Table of Contents

 1. The Problem ⁉️
 2. Implementing a global audio Queue
       2.1. Defining an Audio Player Context
       2.2. Wrapping the audio queue in a Context Provider
       2.3. Providing the context
       2.4. Defining useAudioPlayer Hook
       2.5. Using the Hook 🪝
 3. The Pull Request 📝
 4. Reviewing a PR that adds Custom Provider Classes
 5. What's Ahead 🔜

The Problem ⁉️

If you went through my post, you might have noticed that in order to preserve the order of audio chunks, I was maintaining a classic queue data structure inside my hook.

Here's the code for quick recap:

import { useState, useEffect } from "react";

const useAudioPlayer = () => {
  // We are managing promises of audio urls instead of directly storing strings
  // because there is no guarantee when openai tts api finishes processing and resolves a specific url
  // For more info, check this comment:
  // https://github.com/tarasglek/chatcraft.org/pull/357#discussion_r1473470003
  const [queue, setQueue] = useState<Promise<string>[]>([]);
  const [isPlaying, setIsPlaying] = useState<boolean>(false);

  useEffect(() => {
    if (!isPlaying && queue.length > 0) {
      playAudio(queue[0]);
    }
  }, [queue, isPlaying]);

  const playAudio = async (audioClipUri: Promise<string>) => {
    setIsPlaying(true);
    const audioUrl: string = await audioClipUri;
    const audio = new Audio(audioUrl);
    audio.onended = () => {
      URL.revokeObjectURL(audioUrl); // To avoid memory leaks
      setQueue((oldQueue) => oldQueue.slice(1));
      setIsPlaying(false);
    };
    audio.play();
  };

  const addToAudioQueue = (audioClipUri: Promise<string>) => {
    setQueue((oldQueue) => [...oldQueue, audioClipUri]);
  };

  return { addToAudioQueue };
};

export default useAudioPlayer;
Enter fullscreen mode Exit fullscreen mode

Now this works like a Charm, when you are only using this hook at just one place in the app. But imagine what would happen if I was to use this hook from some other part of the app as well.

const { addToAudioQueue } = useAudioPlayer();

const audioClipUri = textToSpeech(ttsWordsBuffer);
addToAudioQueue(audioClipUri);
Enter fullscreen mode Exit fullscreen mode
  • Would it still maintain the order?
  • Leave the order aside, is there a guarantee there is only a single audio clip playing at a point of time.

The answer, as you might have guessed is a NO

Reason being, every time we call the useAudioPlayer hook, it returns a fresh instance of an audio queue. Let's say you use it at 3 different places in your app, you now have 3 audio players active at the same time.

And if you push audio clips to one of them while another is already playing something, you get into a similar situation as if you there are 3 speakers in a room, and each one is playing a different song.

3 Speakers

This didn't prevent me from merging the existing code as I was only initializing an audio queue at one place throughout the application.

But with this, there was no way to access and operate with the audio queue from other parts of the application, like stopping the audio playback when a new question is asked.

The details are documented in this follow up.

Stop playing audio when tts is turned off #391

Follow up for #357

Currently, when text to speech is turned off image

The announcement of response does not stop. We need to make sure any tts audio playing is stopped and audio player queue needs to be cleared.

Implementing a global audio Queue

To fix this issue, I had to make sure that there was always only one instance of the audio queue throughout the application, while the core logic and interface would remain the same.

This had to be done in the following steps.

Defining an Audio Player Context

The first step was to define a React Context, that would expose the relevant functions to operate with the audio queue.

type AudioPlayerContextType = {
  addToAudioQueue: (audioClipUri: Promise<string>) => void;
  clearAudioQueue: () => void;
};

const AudioPlayerContext = createContext<AudioPlayerContextType>({
  addToAudioQueue: () => {},
  clearAudioQueue: () => {},
});
Enter fullscreen mode Exit fullscreen mode

Notice that I am also exposing a new method called clearAudioQueue now. We'll look at the implementation soon.

Wrapping the audio queue in a Context Provider

For any of the components in a React Application to read from a context, it first needs to be provided to one of its ancestors.

This is done with the help of a React Context Provider. I wrapped the existing audio player logic into a function that returns an AudioPlayerContext Provider exposing the functions that will be used to operate on the single Audio Queue in the app.

import { useState, useEffect, createContext, useContext, ReactNode, FC } from "react";

type AudioClip = {
  audioUrl: string;
  audioElement: HTMLAudioElement;
};

export const AudioPlayerProvider: FC<{ children: ReactNode }> = ({ children }) => {
  // We are managing promises of audio urls instead of directly storing strings
  // because there is no guarantee when openai tts api finishes processing and resolves a specific url
  // For more info, check this comment:
  // https://github.com/tarasglek/chatcraft.org/pull/357#discussion_r1473470003
  const [queue, setQueue] = useState<Promise<string>[]>([]);
  const [isPlaying, setIsPlaying] = useState<boolean>(false);
  const [currentAudioClip, setCurrentAudioClip] = useState<AudioClip | null>();

  useEffect(() => {
    if (!isPlaying && queue.length > 0) {
      playAudio(queue[0]);
    }
  }, [queue, isPlaying]);

  const playAudio = async (audioClipUri: Promise<string>) => {
    setIsPlaying(true);
    const audioUrl: string = await audioClipUri;
    const audio = new Audio(audioUrl);
    audio.preload = "auto";
    audio.onended = () => {
      URL.revokeObjectURL(audioUrl);
      setQueue((oldQueue) => oldQueue.slice(1));
      setIsPlaying(false);

      setCurrentAudioClip(null);
    };
    audio.play();
    setCurrentAudioClip({
      audioElement: audio,
      audioUrl: audioUrl,
    });
  };

  const addToAudioQueue = (audioClipUri: Promise<string>) => {
    setQueue((oldQueue) => [...oldQueue, audioClipUri]);
  };

  const clearAudioQueue = () => {
    if (currentAudioClip) {
      // Stop currently playing audio
      currentAudioClip.audioElement.pause();
      URL.revokeObjectURL(currentAudioClip.audioUrl);

      setCurrentAudioClip(null);
      setIsPlaying(false);
    }

    // Flush all the remaining audio clips
    setQueue([]);
  };

  const value = { addToAudioQueue, clearAudioQueue };

  return <AudioPlayerContext.Provider value={value}>{children}</AudioPlayerContext.Provider>;
};
Enter fullscreen mode Exit fullscreen mode

You'll notice the implementation of the clearAudioQueue function, which is meant to be called from any application component to pause the current audio clip, and flush the remaining clips in the queue.

I am also managing a new state now

type AudioClip = {
  audioUrl: string;
  audioElement: HTMLAudioElement;
};

const [currentAudioClip, setCurrentAudioClip] = useState<AudioClip | null>();
Enter fullscreen mode Exit fullscreen mode

This allows me to pause the current playing audio as soon as the clearAudioQueue function is called.

Providing the context

The next step is to provide the AudioPlayerContext at the root of the application.

In main.tsx

import { AudioPlayerProvider } from "./hooks/use-audio-player";

ReactDOM.createRoot(document.querySelector("main") as HTMLElement).render(
  <React.StrictMode>
    <ChakraProvider theme={theme}>
      <AudioPlayerProvider>
        <SettingsProvider>
          <CostProvider>
            <ModelsProvider>
              <UserProvider>
                <ColorModeScript initialColorMode={theme.config.initialColorMode} />
                <RouterProvider router={router} />
              </UserProvider>
            </ModelsProvider>
          </CostProvider>
        </SettingsProvider>
      </AudioPlayerProvider>
    </ChakraProvider>
  </React.StrictMode>
);
Enter fullscreen mode Exit fullscreen mode

Defining useAudioPlayer Hook

After defining the context and provider, we can finally write the hook to use the AudioPlayerContext provided by the closest context provider.

const useAudioPlayer = () => useContext(AudioPlayerContext);

export default useAudioPlayer;
Enter fullscreen mode Exit fullscreen mode

Using the Hook 🪝

And the final step was to actually use the global Audio Player context at places I would like to.

Currently, I am using it for 2 scenarios:
1. The audio for previous message will instantly stop playing as soon as the user asks a new question.

import useAudioPlayer from "../../hooks/use-audio-player";
...
...
const { clearAudioQueue } = useAudioPlayer();
...
...
 // NOTE: we strip out the ChatCraft App messages before sending to OpenAI.
const messages = chat.messages({ includeAppMessages: false });
// Clear any previous audio clips
clearAudioQueue();
const response = await callChatApi(messages, {
  functions,
  functionToCall,
});
Enter fullscreen mode Exit fullscreen mode

2. Any audio clips being played via useAudioPlayer will instantly stop as soon as the user disables the TTS setting.

import useAudioPlayer from "../../hooks/use-audio-player";
...
...
const { clearAudioQueue } = useAudioPlayer();
...
...
{isTtsSupported && (
  <Tooltip
    label={settings.announceMessages ? "Text-to-Speech Enabled" : "Text-to-Speech Disabled"}
  >
    <IconButton
      type="button"
      size="lg"
      variant="solid"
      aria-label={
        settings.announceMessages ? "Text-to-Speech Enabled" : "Text-to-Speech Disabled"
      }
      icon={
        settings.announceMessages ? <MdVolumeUp size={25} /> : <MdVolumeOff size={25} />
      }
      onClick={() => {
        if (settings.announceMessages) {
          // Flush any remaining audio clips being announced
          clearAudioQueue();
        }
        setSettings({ ...settings, announceMessages: !settings.announceMessages });
      }}
    />
  </Tooltip>
)}
Enter fullscreen mode Exit fullscreen mode

And that is all it takes to create and use a custom audio player hook in a React Application. I am pretty sure there are audio player hooks already published on npm, but I doubt any of them will fit my needs as I need to maintain a queue.

Let me know in the comments if you know a better approach.

The Pull Request 📝

My PR for this improvement is already up awaiting reviews before it lands.
You can look at the entire source code and a detailed explanation here.

Make AudioPlayer queue global #484

In #357, I was able to stream audio responses as the LLM response was generated and also optimize the playback.

But there were some problems with it, and follow ups were filed.

This is regarding one of those follow ups - #391.

The problem was that once an audio clip was added to the audio queue, there was no way to stop it from other parts of the application since every invocation to useAudioPlayer hook resulted in a fresh audio queue. This was pretty annoying and made the TTS feature unusable as users had to wait for the previous message announcement to finish before it started playing for the next one.

To fix this issue, I have made the Audio Player queue global, by creating and providing an AudioPlayerContext at the root of the application.

https://github.com/tarasglek/chatcraft.org/compare/amnish04/global-audio-player?expand=1#diff-1cd8b18798a1a103bfe13bef54354c1f3a3bea29a31c8eea1a0c67a3a839b811

This allowed me to expose another function from the hook called clearAudioQueue which can now be called from any application component to pause the current audio clip, and flush the remaining clips in the queue.

Currently, I am using it for 2 scenarios:

  1. The audio for previous message will instantly stop playing as soon as the user asks a new question.
  2. Any audio clips being played via useAudioPlayer will instantly stop as soon as the user disables the TTS setting.

This approach also makes the audio system more robust any we now have a single source of truth for audio clips, and the risk to play multiple audio clips at once is also reduced.

This fixes #391

Reviewing a PR that adds Custom Provider Classes

I also reviewed a PR from Katie this week.

Right now, we just have to supported AI providers - OpenAI and OpenRouter. But since we are expecting to have more in the future, it would become impractical to manage all of them in a generic ChatCraftProvider class.
Katie is working on creating a separate concrete class for each AI provider inheriting from the current ChatCraftProvider class, such that if new providers do some things differently, existing methods can be overridden and new methods can be added to their own classes.

I went through the code and left a few comments for changes. The PR is still up. If the PR is still up at the time of your reading, please take a look and let us know if you can come up with a better approach!

What's Ahead 🔜

The TTS functionality I've been working on still needs some more work to make best use of the API OpenAI provides.

I'll start out by allowing the users to choose from different voices, ability to re-announce a message, and even downloading the audio narration of a response.

Everything is documented in this issue

Make TTS more flexible #400

  1. Add a Speak submenu to both human and bot messages
  2. In the Speak submenu allow one to select voice, which causes message to be spoken,
  3. once message is spoke, allow generated message to be downloaded..can add a download to the speak submenu after speech completes. It's ugly but not sure re other options
  4. Changing voice in speech menu also changes it for the speak icon when automatic tts is on
  5. should move speak icon to the menu on the left and change it to be "options" or 3 vertical dots on mobile

In this post, I shared my approach of creating and managing an audio queue with a custom React Hook, and how I am leveraging it to stream OpenAI's LLM responses to their TTS API, and ultimately playing the generated clips in order.

I'll follow up about the other planned functionalities soon.

In the meantime, STAY TUNED!

Top comments (2)

Collapse
 
jollus174 profile image
Joel Sweetman

I'm currently creating an audio player for a React-based book reader that'll provide audio for any words that are clicked. This was very useful, thanks for posting!

Collapse
 
amnish04 profile image
Amnish Singh Arora

Glad it helped 😊