Overview
Decifer is a cross-platform mobile app that helps to generate transcripts either from a voice recording or by uploading an audio file.
Try out the app: https://play.google.com/store/apps/details?id=com.souvikbiswas.deepgram_transcribe
Typically, for using Deepgram API you would require to maintain a server but I have made this project totally serverless. To know more continue reading.
Here's a brief demo of the entire app in action:
Submission Category:
Analytics Ambassadors
Link to Code on GitHub
The entire app is open sourced - try it out and also feel free to contribute to this project 😉 :
Blog post about this project: https://dev.to/sbis04/decifer-generate-transcripts-with-ease-5hl3
Try out the app: https://appdistribution.firebase.dev/i/a57e37b2fda28351
A cross-platform mobile app that helps you to generate transcripts either from a voice recording or by uploading an audio file. The project uses a totally serverless architecture.
Architecture
The mobile app is created using Flutter which is integrated with Firebase. Firebase Cloud Functions is used to deploy the backend code required for communicating with the Deepgram API.
App overview
The Flutter application consists of the following pages/screens:
- Login Page
- Register Page
- Dashboard Page
- Record Page
- Upload Page
- Transcription Page
For authenticating the user inside the app -- Login and Register pages are used. Authentication is required to generate unique accounts for users required for storing the generated transcripts to Firestore and facilitate cloud-sync.
The Dashboard Page displays a list of all the transcripts currently present on the user's account. It also has two buttons -…
Project Description
The primary features of the app are as follows:
- Generate transcript from audio recording & audio file using Deepgram API.
- Cloud-sync for syncing across multiple devices using the same account.
- Transcribe confidence map view.
- Export as PDF and share with anyone.
Architecture
I'm using a totally serverless architecture for this project 🤯, let's have a look how it works:
The mobile app is created using Flutter which is integrated with Firebase. I have used Firebase Cloud Functions to deploy the backend code required for communicating with the Deepgram API.
Firebase Cloud Functions lets you run backend code in a severless architecture.
I have deployed the following function to Firebase:
const functions = require("firebase-functions");
const {Deepgram} = require("@deepgram/sdk");
exports.getTranscription = functions.https.onCall(async (data, context) => {
try {
const deepgram = new Deepgram(process.env.DEEPGRAM_API_KEY);
const audioSource = {
url: data.url,
};
const response = await deepgram.transcription.preRecorded(audioSource, {
punctuate: true,
utterances: true,
});
console.log(response.results.utterances.length);
const confidenceList = [];
for (let i =0; i < response.results.utterances.length; i++) {
confidenceList.push(response.results.utterances[i].confidence);
}
const webvttTranscript = response.toWebVTT();
const finalTranscript = {
transcript: webvttTranscript,
confidences: confidenceList,
};
const finalTranscriptJSON = JSON.stringify(finalTranscript);
console.log(finalTranscriptJSON);
return finalTranscriptJSON;
} catch (error) {
console.error(`Unable to transcribe. Error ${error}`);
throw new functions.https.HttpsError("aborted", "Could not transcribe");
}
});
The getTranscription
function takes an audio URL, generates the transcripts using Deepgram API along with the respective confidences, and returns the data in a particular JSON format (that can be parsed within the app).
App screens
The Flutter application consists of the following pages/screens:
- Login Page
- Register Page
- Dashboard Page
- Record Page
- Upload Page
- Transcription Page
For authenticating the user inside the app -- Login and Register pages are used. Authentication is required to generate unique accounts for users required for storing the generated transcripts to Firestore and facilitate cloud-sync.
The Dashboard Page displays a list of all the transcripts currently present on the user's account. It also has two buttons - one for navigating to the Record Page and the other for navigating to the Upload Page.
Record Page lets you record your audio using the device microphone and the transcribe it using Deepgram. You always have an option to re-record if you think the last recording wasn't good.
From the Upload Page, you can choose any audio file present on your device and generate the transcript of it.
Transcription Page is where the entire transcript can be viewed. It has an audio-transcript synchronized playback that highlights the text transcript part with respect to the audio that is playing.
You can also see the confidence map of each of the parts of the transcript (it shows how much accurate is that part of transcript generation - darker is higher confidence).
You can also easily print or share the generated transcript in the PDF format.
Deepgram
Overview of my Deepgram dashboard (completed the mission, Get a Transcript via API or SDK):
Usage analytics of the Deepgram API:
Log of one of the API calls for transcribing from audio:
Top comments (10)
great piece of work 🔥🔥🔥
Thanks! 😊
This is awesome! Great job on this.
Thanks! 😊
This is great man!
I love how you explained your process.
I wish you the best and I've starred the repo!
Awesome!
Thanks! 😊
Let me know in the comments what do you think about the project 🙂
Useless blog since you don't explain how the code works. This post and nothing is the same thing.
Actually this is not an explanatory blog, this was just a hackathon submission. I'll try to write a proper blog post with the code explanation when I get some time.
Useless reply since you don't explain your critique. This comment adds nothing to the conversation.