DEV Community

Ethan Zerad
Ethan Zerad

Posted on

The 16th Day of the 100 Days of Code challenge

Hey everyone!

Today was a fun vacation day, but I still got 2 hours of coding in. I finally fixed a super small problem that was driving me crazy yesterday (honestly, super simple fix, I don't know if I should be ashamed). What I wanted to do was to emit audio chunks from my React app to my Flask API which uses SocketIO and I just couldn't do it. I'm using the React-Mic library to record the user and it generates blobs that are in webm format, and I was frustruating over converting them to WAV format so that I could create a WAV file with the audio data.

Little did I know, this wasn't too important since I could've easily converted the final blob and saved it as a WAV file in the API. I learned that I need to explore way more options, such as trying to do it in your API (which is where most things should happen) and minimize complexity in the front-end part.

Currently focusing on exploring approaches to real-time transcription, as this is what my project is about (real-time transcription with Whisper) and I'm wondering how I should approach this in terms of buffering, maintaining proper accuracy, latency and formatting.

This is something I'm approaching in a step-by-step manner, focusing on buffering first. I didn't even make a list before this, but I just made it now writing this. For me, this sort of things, even if nobody reads this, is my goal.

A big part of my project is the Whisper transcription library by OpenAI (which I love!). The possibilities are endless with such a library, and it's super fun to be able to witness its upbringing. Anyway, for those who don't know, there's a few models (enormous datasets on which its trained on) we can make use of. If I use the large one, execution times go beyond the 20-second mark. The tiny one optimized for English works great, but there's a lot of accuracy being sacrificed. I'm super curious and interested to learn about strategies that may even involve how I to approach executing the library's functions in quicker and more efficient ways.

I'm super early into my research process, so I'll be updating you (or myself) on how it goes.

That's it for today, happy coding everyone!

Top comments (0)