This is a submission for the AssemblyAI Challenge : Sophisticated Speech-to-Text.
βοΈ Introduction
Navigating mental health challenges can feel overwhelming and isolating. Whether dealing with stress, anxiety, or burnout, finding immediate support isn't always easy or affordable. What if you had a compassionate companion available 24/7, offering not only support but also personalized encouragement tailored just for you?
Meet Gynoraβa culturally sensitive, AI-powered mental health companion designed to provide real-time chat support and personalized affirmations. Whether you need a listening ear or uplifting words, Gynora is here to support you every step of the way.
In this article, Iβll introduce you to Gynora, its key features, the challenges behind its creation, how I integrated AssemblyAI, and how itβs revolutionizing mental health care.
π Key Features
1/ Full-Featured Authentication: Gynora ensures security and user experience with its robust authentication system powered by NextAuth, allowing only verified users to access the app. The system includes beautifully designed emails for account verification and password resets, enhancing both functionality and user engagement.
2/ Real-Time AI Chat Support
Gynora's chat feature uses advanced AI models to simulate a conversation with a therapist. Users can share their thoughts and receive empathetic responses, personalized to their unique situation. Conversations can be tailored to focus on solution-oriented or emotional support tones.
3/ Voice-Enabled Interactions
Gynora is voice-enabled, allowing users to have conversations entirely through audio. Powered by AssemblyAI's text-to-speech model, Universal-2, it allow users to chat using their voice, making it easier to engage hands-free.
4/ Personalized Affirmations
Gynora generates affirmations tailored to your specific challenges and goals. By analyzing user input, it offers uplifting statements that resonate, helping to reinforce a positive mindset.
5/ Affirmation Subtitles: Each affirmation comes with subtitles, making it easier to follow along. These subtitles are generated after transcription by AssemblyAIβs Universal-2 model, ensuring clarity and accessibility.
6/ Culturally Sensitive Responses
Acknowledging the diversity of our human experiences, Gynora provides culturally aware support & affirmations, making it an inclusive solution for users from various backgrounds.
7/ Private and Secure Conversations
Gynora ensures your privacy with end-to-end encryption and robust data protection measures. Users can feel safe knowing their conversations remain confidential.
π οΈ Tech Stack
FrontEnd: TypeScript, Next.js
BackEnd: Next.js API Routes, Server Actions, Prisma
Styling: Tailwind CSS, shadcn/ui components
File Storage: Edgestore
Rate Limit: Upstash
Authentication: Next Auth
AI Models: AssemblyAI's Universal-2, OpenAI's GPT-4o Mini, Google's Gemini 1.5 Flash, OpenAI's TTS-1
In-Browser Preview: Remotion
π¦ How I Used AssemblyAI
I had fun trying out a couple of stuff with AssemblyAI! Here they are:
1/ Audio Recording Component (w/ Transcript Text) (AudioRecordingModal.tsx
):
AssemblyAI's raw transcription feature was used instead of the RealtimeTranscriber function. This was because real-time transcription required credits to be purchased.
Key Features:
- Record the user's voice
- Stop recording & send audio to
/api/transcribe
route for processing - Transcribe audio & return the transcript text
- Receive transcript text & submit the form
Here's how it works:
a. Record the user's voice:
const setupAudioRecording = async () => {
try {
const devices = await navigator.mediaDevices.enumerateDevices();
const audioDevice = devices.find(device => device.kind === 'audioinput');
const stream = await navigator.mediaDevices.getUserMedia({
audio: audioDevice ? { deviceId: audioDevice.deviceId } : true
});
streamRef.current = stream;
audioContextRef.current = new AudioContext();
analyserRef.current = audioContextRef.current.createAnalyser();
analyserRef.current.fftSize = 256;
sourceRef.current = audioContextRef.current.createMediaStreamSource(stream);
sourceRef.current.connect(analyserRef.current);
return true;
} catch (error) {
console.error('Error accessing microphone :>>', error);
toast.error('Unable to access microphone. Please check your permissions.');
return false;
}
};
const setupSuccess = await setupAudioRecording();
if (!setupSuccess) {
return;
}
const mediaRecorder = new MediaRecorder(streamRef.current!);
mediaRecorderRef.current = mediaRecorder;
mediaRecorder.addEventListener('dataavailable', (event) => {
audioChunksRef.current.push(event.data);
});
b. Stop recording & send audio to /api/transcribe
route for processing:
mediaRecorder.addEventListener('stop', async () => {
setIsProcessing(true);
try {
const audioBlob = new Blob(audioChunksRef.current, { type: 'audio/mp3' });
const transcription = await transcribeAudio(audioBlob);
onTranscriptionComplete(transcription);
onClose();
} catch (error) {
console.error('Error processing audio:', error);
toast.error('Error processing audio');
} finally {
setIsProcessing(false);
}
});
c. Transcribe audio & return the transcript text:
import { AssemblyAI } from 'assemblyai';
import { getToken } from 'next-auth/jwt';
import { blobToFile } from '#/lib/utils';
import { NextRequest, NextResponse } from 'next/server';
const assemblyAIClient = new AssemblyAI({
apiKey: process.env.ASSEMBLY_AI_API_KEY!
});
export const POST = async (req: NextRequest) => {
const token = await getToken({ req });
if (!token) {
return NextResponse.json({ message: 'Unauthenticated!' }, { status: 401 });
}
try {
const formData = await req.formData();
const audioFile = await blobToFile(formData.get('audio') as Blob, 'audio.mp3');
const transcript = await assemblyAIClient.transcripts.transcribe({
audio: audioFile,
language_code: 'en',
});
console.log('Audio Transcript :>>', transcript);
return NextResponse.json({ transcription: transcript.text }, { status: 200 });
} catch (error) {
console.error('Server Error [POST/Transcribe]:>>', error);
return NextResponse.json({ message: 'Error transcribing audio' }, { status: 500 });
}
};
d. Receive transcript text & submit the form:
const handleAudioRecordingComplete = (transcription: string) => {
setInput(transcription);
if (submitButtonRef.current) {
submitButtonRef.current.click();
}
};
This implementation while not real-time provides a comfortable workaround pending when I buy credits.
2/ Affirmation Subtitles (/api/createAffirmation/route.ts
):
AssemblyAI's Universal-2 model was used to transcribe the affirmation audio, then retrieve its subtitle in VTT format.
Key Features:
- Transcribe audio, retrieve subtitle & save to database
- Convert saved subtitle to clickable timestamps
Here's a breakdown of its functionality:
a. Transcribe audio, retrieve subtitle & save to database:
// Generating affirmation text & audio...
const audioFile = await blobToFile(audioBlob, 'audio.mp3');
const transcript = await assemblyAIClient.transcripts.transcribe({ audio: audioFile, language_code: 'en' });
const subtitleResult = await assemblyAIClient.transcripts.subtitles(transcript.id, 'vtt');
const affirmation = await prisma.affirmation.create({
data: {
...data,
content: text,
subtitle: subtitleResult,
audioUrl: uploadResult.url,
user: { connect: { userId: `${token.sub}` } }
}
});
console.log('Created Affirmation :>>', affirmation);
return NextResponse.json({ message: 'Affirmation created successfully!', data: affirmation }, { status: 201 });
b. Convert saved subtitle to clickable timestamps:
export const parseSubtitle = (subtitleText: string) => {
const lines = subtitleText.split('\n');
const entries = [];
let start = 0;
let end = 0;
let text = '';
for (const line of lines) {
// Skip WEBVTT header or empty lines
if (line.startsWith('WEBVTT') || line.trim() === '') continue;
// Check for timestamp lines and handle both formats
const timeMatch = line.match(/(\d{2}):(\d{2})(?::(\d{2}))?\.(\d{3}) --> (\d{2}):(\d{2})(?::(\d{2}))?\.(\d{3})/);
if (timeMatch) {
if (text) {
entries.push({ start, end, text: text.trim() });
text = ''; // Reset text for next subtitle
}
// Convert timestamps based on the match
if (timeMatch[3] && timeMatch[6]) {
// Old format: HH:MM:SS.MS --> HH:MM:SS.MS
start = convertToSecondsOldFormat(timeMatch[1], timeMatch[2], timeMatch[3], timeMatch[4]);
end = convertToSecondsOldFormat(timeMatch[5], timeMatch[6], timeMatch[7], timeMatch[8]);
} else {
// New format: MM:SS.MS --> MM:SS.MS
start = convertToSecondsNewFormat(timeMatch[1], timeMatch[2], timeMatch[4]);
end = convertToSecondsNewFormat(timeMatch[5], timeMatch[6], timeMatch[8]);
}
} else if (line.trim()) {
// Accumulate subtitle text
text += line + ' ';
}
}
// Add the last entry if there's text left
if (text) {
entries.push({ start, end, text: text.trim() });
}
return entries;
};
export const convertToSecondsNewFormat = (minutes: string, seconds: string, milliseconds: string): number => {
const totalSeconds = parseInt(minutes) * 60 + parseFloat(`${seconds}.${milliseconds}`);
return totalSeconds;
};
export const convertToSecondsOldFormat = (hours: string, minutes: string, seconds: string, milliseconds: string): number => {
const totalSeconds = parseInt(hours) * 3600 + parseInt(minutes) * 60 + parseFloat(`${seconds}.${milliseconds}`);
return totalSeconds;
};
// Convert subtitle to clickable timestamps
const subtitle = parseSubtitle(`${affirmation.subtitle}`);
This functionality makes the subtitle clickable. When a line in a subtitle is clicked, the corresponding position on the audio is set.
πͺ Challenges Faced
1/ Realtime Transcriptions: AssemblyAI does not allow a user to use the real-time transcription feature till they purchase some credits. This made me find a workaround that is not real-time. This is just a stop-gap measure as it makes for poor UX.
2/ AI Integration: Integrating AI services for mental health support & affirmation generation was a significant challenge. Ensuring that the AI produces high-quality output required extensive testing and fine-tuning. I also ran into rate limits while I was testing aggressively.
3/ User Experience: Creating an intuitive and user-friendly interface was crucial. I spent a considerable amount of time designing and iterating on the UI to ensure it meets users' needs while being aesthetically pleasing. This was a lot tougher for me because I didn't have the time to bring in a designer to work with me ;(.
πΈ Screenshots
π Project Link
Link: https://dub.sh/GynoraDemo
π» Code Repository
Link: https://git.new/GynoraRepo
β¨ Conclusion
Gynora is redefining mental health support by making it accessible, personalized, and culturally sensitive. Whether you're seeking real-time guidance, affirmations to lift your spirits, or strategies to overcome stress, Gynora is your trusted companion.
I canβt wait to see how Gynora positively impacts mental health journeys worldwide. Feel free to try it out, share feedback, and watch this space as we continue to add new features!
Top comments (1)
Amazing!