Given some speech in a specific language, use this UI/API to generate a translation - in either text or audio.
This project focused on building out the API by stitching together Deepgram API for (STT - speech to text), Google Translate API, and Wellsaid Labs API (TTS - text to speech).
With the creation of this API, applications of it can be used to aid users in understanding audio in another language that may not already come with captions/translation.
- The goal of this project was to focus on the API rather than the UI. With more time, I'd prefer to swap out this Next.js UI in favor of a Google Chrome extension and try to listen to whatever audio is playing in a browser tab and initiate the translation through the extension.
- Google Translation can translate to a number of languages other than English, but for now the default is set to English until we can access some WellSaid Labs voice actors in other languages.
- Detecting voice style in audio would be another awesome improvement in order to select a voice actor that "matches" the voices from the input audio. For now we default to 1 voice actor from WellSaid when we could take advantage of the 50+ voices that are available.
(Please note that I have a feature flag disabled so the example below will show the translation part of this app, but not the audio generation part.)
In this screenshot, we have uploaded an audio clip (taken from an interview with Selena Quintanilla) which is originally in Spanish. And below the button, we display the text that we have transcribed with Deepgram and translated with Google.
Here is the output as from the server logs:
Sample Audio with WellSaid Labs Synthetic Voice (Alana B.) hosted here: