This is a submission for the Cloudflare AI Challenge.
What I Built
I built AudioInsight, an app that processes audio, transcribes it, summarizes it, generates a title for the content, and allows users to ask questions about the related audio.
The chat and audio are stored remotely, so the user can access later and ask new questions or rewatch the audio.
To make this app, I used some products from the Cloudflare catalog, such as: Cloudflare D1, R2, Workers AI, Cloudflare Pages, and AI Models: Automatic Speech Recognition, Summarization, and Text Generation. These will be explained in the Journey section.
Demo
Demo Link
Original Cloudflare Pages Demo Link
My Code
gabrielsenadev / audioinsight
AudioInsight is a web application that processes audio, generates transcriptions, and allows users to ask questions about the related audio.
AudioInsight
AudioInsight is a full-stack application that processes audio, generates transcriptions, and allows users to ask questions about the related audio.
Its creation was motivated by participation in a dev.to challenge.
Table of Contents
How to Install
- Start by cloning this repository:
git clone git@github.com:gabrielsenadev/audioinsight.git
- Install dependencies:
npm ci
- Configure your environment
- Run application
npm run dev
Environment Variables
This application depends on some providers to work with ai and database. It has been developed with minimal provider dependency. So, if you prefer a different provider, you can easily switch.
Cloudflare AI:
This application integrates with the Cloudflare AI ecosystem to utilize AI Models.
- CLOUDFLARE_ACCOUNT_ID
- CLOUDFLARE_API_TOKEN
Netlify Blobs:
For storing audio data, this application relies on Netlify Blobs. You will need a Netlify Site and Account.
- NETLIFY_SITE_ID
- NETLIFY_TOKEN
MongoDB:
MongoDB is used to store chats and chat messages.
- …
Journey
Working with AI is a curious thing to me. I was thinking about developing something using AI, and this challenge is enough to motivate me to do this. So, one of the incredible AI features for me is the capability to transform voice into text. Therefore, I decided to follow this path.
After some time thinking about what to do, I decided to process the audio, generate the content of the audio, summarize the content, and allow the user to ask questions about the uploaded audio.
I also decided to explore more of the Cloudflare ecosystem. Thus, one of my personal requirements is the capability to store chat and audio remotely and provide a way for the user to go back later.
After defining my requirements and goals, I started learning about the AI, how it works, and how Workers AI works. In this process, I decided to use these AI models: a audio to text (whisper), a content summarization (bart-large-cnn) and text generation to answer questions and generate chat title (neural-chat-7b-v3-1-awq).
In the Multiple Models and/or Triple Task Types section, I explain how I use these models and show the application flow, which explains how I combine these AI models to participate in the Additional Prize Category.
After developing the main idea, I began to understand how Cloudflare databases and Cloudflare R2 work. Then, I implemented the capability to store user's chats and audio.
Multiple Models and/or Triple Task Types
To create this app, I utilized three different AI model types to generate its content.
- whisper is responsible for converting audio to text.
- bart-large-cnn is tasked with generating a summary of the related audio content.
- neural-chat-7b-v3-1-awq handles generating the chat title and answering questions about the related content.
When the user uploads an audio, start the chat creation process. Here, I combine all three models to generate a piece of content for each: audio transcription, summarization, and chat title.
When the user asks a question, I only use the text generation AI model to answer the user's question.
Application Flow
Follow the flow below to understand how it works.
In this flow, we understand how I use these different AI models and how I utilized Cloudflare storage solutions to develop this app.
Final words
Developing this entry helped me understand more about how the AI ecosystem works and how I could use the Cloudflare ecosystem to empower my ideas into products.
Looking ahead, I'm considering incorporating private chats and additional chat features to enhance user interaction with audio.
Thank you for this challenge!
Top comments (0)