DEV Community

Cover image for Podcast Companion with Real-Time Transcription Using AssemblyAI
Jaykumar Patel
Jaykumar Patel

Posted on

Podcast Companion with Real-Time Transcription Using AssemblyAI

This is a submission for the AssemblyAI Challenge : Sophisticated Speech-to-Text.

What I Built

I built a Java-based podcast player application that not only allows users to play podcasts but also provides transcription of the audio using AssemblyAI’s Speech-to-Text model. The application enhances the listening experience by displaying the transcription alongside the podcast, making it accessible and user-friendly.

StarTalk is a podcast on science, comedy, and popular culture hosted by astrophysicist Neil deGrasse Tyson and comedian Chuck Nice, with various other comic and celebrity co-hosts and frequent guests from the world of science and entertainment.

Demo

The app is available at this Github Repo.

You can see the demo video on YouTube

Journey

Incorporating AssemblyAI’s Speech-to-Text model, Universal-2, into my application was an enlightening experience. The application fetches podcasts from an RSS feed, plays the audio using the JLayer library, and sends the audio URL to AssemblyAI for transcription. The transcription is then displayed within the application.

The application architecture follows these steps:

  1. Fetch Podcasts: Retrieve podcast metadata from an RSS feed.
  2. Play Audio: Use JLayer to stream and play audio directly within the app.
  3. Transcription Request: Send the audio URL to AssemblyAI's API for transcription.
  4. Display Transcription: Periodically poll AssemblyAI's API to get the transcription status and display the result in a JTextArea within the application.

Key Features:

  • Single Podcast Per Page: The UI is designed to show one podcast per page with navigation buttons for easy browsing.
  • Playback Controls: Includes play, stop buttons, and a seek bar.
  • Transcription: Transcription is fetched and displayed alongside the podcast for a seamless experience.

Future Improvements:

  • Enhanced Seek Functionality: Implementing accurate seek functionality for better user control over playback.
  • Error Handling: Improving error handling for network requests and transcription polling.

Top comments (0)