DEV Community

Cover image for Sophisticated Speech-to-Text Application
David Akim
David Akim

Posted on

Sophisticated Speech-to-Text Application

This is a submission for the AssemblyAI Challenge : Sophisticated Speech-to-Text.

What I Built

I developed a Speech-to-Text application in Taipy using AssemblyAI's Universal-2 Speech-to-Text model. The application's features are:

  1. Transcribe spoken words into written text.
  2. Detect multiple speakers in an audio file and what each speaker said.
  3. Summarize your audio data with key takeaways
  4. Download transcriptions to a text file.

Demo

Link to Github repository

Screenshots

Transcription

Summarization

Diarization

Journey

The application was developed using Taipy, a Python-based framework, which made integrating with AssemblyAI’s Speech-to-Text Model seamless, as both use Python. AssemblyAI's comprehensive documentation simplified the implementation of the transcription and diarization features. Taipy was utilized for the user interface, while AssemblyAI handled all the Speech-to-Text processing.
This submission also meets the criteria for the No More Monkey Business Challenge Prompt. The summarization feature was implemented using LeMUR, where a custom prompt was sent to the LLM to generate a concise summary of the transcript.
This is a solo submission, with all the work on Taipy and AssemblyAI completed by myself. It was an enjoyable learning experience. AssemblyAI has made building Speech-to-Text applications incredibly easy, and I will certainly use it again in the future.

Top comments (0)