DEV Community

Cover image for Deepgram x DEV Hackathon - Treehouse - Translating Audio Files
amanda hernandez
amanda hernandez

Posted on

Deepgram x DEV Hackathon - Treehouse - Translating Audio Files

Overview of My Submission

Given some speech in a specific language, use this UI/API to generate a translation - in either text or audio.

This project focused on building out the API by stitching together Deepgram API for (STT - speech to text), Google Translate API, and Wellsaid Labs API (TTS - text to speech).

With the creation of this API, applications of it can be used to aid users in understanding audio in another language that may not already come with captions/translation.

Future Considerations:
UI

  • The goal of this project was to focus on the API rather than the UI. With more time, I'd prefer to swap out this Next.js UI in favor of a Google Chrome extension and try to listen to whatever audio is playing in a browser tab and initiate the translation through the extension.

API

  • Google Translation can translate to a number of languages other than English, but for now the default is set to English until we can access some WellSaid Labs voice actors in other languages.
  • Detecting voice style in audio would be another awesome improvement in order to select a voice actor that "matches" the voices from the input audio. For now we default to 1 voice actor from WellSaid when we could take advantage of the 50+ voices that are available.

Submission Category:

Accessibility Advocates

Link to Code on GitHub

https://github.com/jumpmanda/treehouse

Additional Resources / Info

An Example
(Please note that I have a feature flag disabled so the example below will show the translation part of this app, but not the audio generation part.)

In this screenshot, we have uploaded an audio clip (taken from an interview with Selena Quintanilla) which is originally in Spanish. And below the button, we display the text that we have transcribed with Deepgram and translated with Google.
Treehouse UI with steps on how to use service
Here is the output as from the server logs:
Output logs from API

Sample Audio with WellSaid Labs Synthetic Voice (Alana B.) hosted here:

http://www.sndup.net/y5qj

Discussion (0)