DEV Community

Cover image for Cloudflare AI Challenge: Audio Interaction with AudioInsight
Gabriel Sena
Gabriel Sena

Posted on

Cloudflare AI Challenge: Audio Interaction with AudioInsight

This is a submission for the Cloudflare AI Challenge.

What I Built

I built AudioInsight, an app that processes audio, transcribes it, summarizes it, generates a title for the content, and allows users to ask questions about the related audio.

The chat and audio are stored remotely, so the user can access later and ask new questions or rewatch the audio.

To make this app, I used some products from the Cloudflare catalog, such as: Cloudflare D1, R2, Workers AI, Cloudflare Pages, and AI Models: Automatic Speech Recognition, Summarization, and ​​Text Generation. These will be explained in the Journey section.

Demo

Demo Link
Original Cloudflare Pages Demo Link

My Code

GitHub logo gabrielsenadev / audioinsight

AudioInsight processes audio, transcribes it, summarizes it, generates a title for the content, and allows users to ask questions about the related audio.

audio insight screenshot

AudioInsight - Cloudflare AI Challenge Entry

AudioInsight processes audio, transcribes it, summarizes it, generates a title for the content, and allows users to ask questions about the related audio.

This is an entry for the Cloudflare AI Challenge.

Live on: https://audioinsight.gabrielsena.dev/

How It Works

  1. On the application's homepage, the user uploads an audio file.
  2. We use the whisper model to transcribe the audio into text.
  3. We use the neural-chat-7b-v3-1-awq model to generate a title based on the provided content.
  4. We summarize the content with the bart-large-cnn model.
  5. After that, the user can ask questions, and we use the neural-chat-7b-v3-1-awq model to answer the user's questions.

Under the Hood

Journey

Working with AI is a curious thing to me. I was thinking about developing something using AI, and this challenge is enough to motivate me to do this. So, one of the incredible AI features for me is the capability to transform voice into text. Therefore, I decided to follow this path.

After some time thinking about what to do, I decided to process the audio, generate the content of the audio, summarize the content, and allow the user to ask questions about the uploaded audio.

I also decided to explore more of the Cloudflare ecosystem. Thus, one of my personal requirements is the capability to store chat and audio remotely and provide a way for the user to go back later.

After defining my requirements and goals, I started learning about the AI, how it works, and how Workers AI works. In this process, I decided to use these AI models: a audio to text (whisper), a content summarization (bart-large-cnn) and text generation to answer questions and generate chat title (neural-chat-7b-v3-1-awq).

In the Multiple Models and/or Triple Task Types section, I explain how I use these models and show the application flow, which explains how I combine these AI models to participate in the Additional Prize Category.

After developing the main idea, I began to understand how Cloudflare databases and Cloudflare R2 work. Then, I implemented the capability to store user's chats and audio.

Multiple Models and/or Triple Task Types

To create this app, I utilized three different AI model types to generate its content.

  • whisper is responsible for converting audio to text.
  • bart-large-cnn is tasked with generating a summary of the related audio content.
  • neural-chat-7b-v3-1-awq handles generating the chat title and answering questions about the related content.

When the user uploads an audio, start the chat creation process. Here, I combine all three models to generate a piece of content for each: audio transcription, summarization, and chat title.

When the user asks a question, I only use the text generation AI model to answer the user's question.

Application Flow
Follow the flow below to understand how it works.

An image with two flows shows how this application combines Cloudflare solutions, including three different AI models, to develop this app

In this flow, we understand how I use these different AI models and how I utilized Cloudflare storage solutions to develop this app.

Final words

Developing this entry helped me understand more about how the AI ecosystem works and how I could use the Cloudflare ecosystem to empower my ideas into products.

Looking ahead, I'm considering incorporating private chats and additional chat features to enhance user interaction with audio.

Thank you for this challenge!

Top comments (0)