DEV Community

Cover image for EchoSense: Your Pocket-Sized Companion for Smarter Meetings
Rafael Milewski
Rafael Milewski Subscriber

Posted on

EchoSense: Your Pocket-Sized Companion for Smarter Meetings

This is a submission for the AssemblyAI Challenge : Sophisticated Speech-to-Text and No More Monkey Business.

What I Built

I developed EchoSense, a portable hardware device that captures spoken content in settings like meetings, classes, brainstorming sessions, and conferences. It features a web interface with real-time transcriptions of everything echoing through its microphone. Users can ask questions about the discussion or generate summaries in real-time, making it an invaluable tool for live events.

The device operates on a modest 40MHz SoC with 4MB of RAM. It’s lightweight, efficient, and can run on a tiny lithium battery, making it highly portable.

Tech Used

  • Vue, TypeScript, shadcn/ui
  • ESP32, Rust, Espressif IoT Development Framework (IDF)
  • WebSocket, SendGrid, AssemblyAI

Demo

Since this is a hardware device, providing a link to a demo isn’t feasible. However, I’ve recorded a video showcasing it in action, along with instructions on how to build one yourself.

and here is the GitHub repository with the source code:

GitHub logo milewski / echosense-challenge

Making sense of echoes and delivering insights

EchoSense

Portable device for real-time audio transcription and interactive summaries.

This is the main repository for my submission to AssemblyAI Challenge.

  • Esp32: The firmware source code for the ESP32-S3-Zero device.
  • Frontend: The UI that communicates with the device via websocket.

Each subfolder includes instructions for running the project locally.


For a more detailed overview, including screenshots, you can read the submission sent to the challenge here:

https://dev.to/milewski/echosense-your-pocket-sized-companion-for-smarter-meetings-3i71






Screenshots

demo-1

demo-2

demo-3


Journey

When powered on, the device automatically connects to the configured Wi-Fi network and requests a temporary token from AssemblyAI, valid for one hour. It establishes a real-time transcription WebSocket connection and generates a local network URL, displayed as a QR code on the OLED screen.

The QR code directs users to the device’s IP address, where a web server runs on port 80. The server hosts a Vue.js-based interface, with all assets (CSS, JS, images) inlined into a single minified and mangled HTML file.

This optimization ensures minimal memory usage—essential in a resource-constrained environment where every byte counts.

As the user speaks, audio is streamed in ~500ms chunks, sampled at 16000Hz in PCM16 format, via the WebSocket connection to AssemblyAI. Transcriptions are returned and displayed live to any user who scans the QR code. Simultaneously, the audio is saved locally on the device’s SD card for further use.

The following diagram illustrate this functionality:

diagram-1

diagram-2


Prompts Qualification

My submission qualifies for 2 prompts:

  • Really Rad Real-Time
  • No More Monkey Business

Incomplete functions

  • The SD card was initially intended to store recordings and later attach them to emails. However, I realized that file sizes would grow too large, exceeding email attachment limits. To address this, a backend would be required to receive and convert the files from raw PCM16 to MP3. Since this wasn't the main focus of the challenge, I left this feature unfinished, as it would require building and hosting a backend.

  • Currently, there’s no way to configure Wi-Fi, API keys, or recording options via the web UI. All keys are injected at build time during compilation. Ideally, users would set up the device via a local Wi-Fi connection between their phone and the device, but this setup would require additional work.

  • I had planned to design and 3D print a case, possibly as a cube, to align with names like MeetingBox or MetaCube. Unfortunately, I didn’t have time to complete this, so the prototype was built and presented on a breadboard.


If anyone has any question, feel free to ask below or open an issue on GitHub—I’ll be happy to help!

Top comments (16)

Collapse
 
ritesh_hiremath_eb6abb681 profile image
Ritesh Hiremath

Nice work Rafael!!

Collapse
 
milewski profile image
Rafael Milewski

Thanks! 😎

Collapse
 
hugoacds profile image
Hugo Antonio

Wow, nice.

Collapse
 
ngtduc693 profile image
Duc Nguyen Thanh

good job bro

Collapse
 
samueljesse profile image
Samuel Jesse

Incredible!

Collapse
 
murthyug profile image
U G Murthy

@milewski This is very creative. Fantastic work. Love it.

Collapse
 
milewski profile image
Rafael Milewski

Thanks! ✨

Collapse
 
fowusu68 profile image
Felix

wow , as a beginner as i am , your project is awesome , how did you come up with such idea?

Collapse
 
milewski profile image
Rafael Milewski • Edited

When I saw the sponsor's name, "AssemblyAI," it immediately made me think of embedded hardware... Assembly -> Low-Level -> Hardware. So, I decided to use some modules I had on hand and quickly brainstorm ways to make the most of it..

Collapse
 
fowusu68 profile image
Felix

WOW you did such amazing work , if i wanna learn those IOT , which language should i start with?

Thread Thread
 
milewski profile image
Rafael Milewski

I would say there are two ways:

You can go with C, your learning path will be easier, and you would benefit from tons of libraries for every module you can buy.

Or you can go with Rust. It’s definitely going to be a much harder journey, but in my opinion, it will be a much better deal for your future.

And in terms of platforms, there are three major ones: Arduino, STM32, and ESP32. Arduino might be the most popular, but its boards are expensive and have very low specs compared to ESP32, which is cheaper and, in most cases, offers much better hardware for the price.

I have a repository on GitHub where I documented my journey while learning this stuff. It might be helpful for you to take a look and see how much you understand or can follow by reading the code, so you can get an idea of how easy or hard it may be..

github.com/milewski/sensors-esp

Thread Thread
 
fowusu68 profile image
Felix

Thanks a lot, what about using python?

Thread Thread
 
milewski profile image
Rafael Milewski

It is possible to use Python, search for MicroPython... However, keep in mind that these devices have very limited memory. Using a language like Python, which is interpreted and includes a garbage collector, will consume most of the available memory, leaving very little for your application. The best approach is to use compiled languages with no runtime overhead, such as C or Rust.

To give you an idea of the challenges you might face (which are often unnoticed on conventional computers), I was unable to use the Get Transcription API because it returned a massive JSON response. This exceeded the memory available in my stack, and I only discovered the issue near the end of the project. As a result, I had to pivot my approach and use the Streaming API instead.

Thread Thread
 
fowusu68 profile image
Felix

Okay that's great answer , thanks man

Thread Thread
 
fowusu68 profile image
Felix

Do you really have any challenge community like dev community that I can participate by building websites and showcase them?

Collapse
 
verboweb profile image
Davi de Feo

Muito bom! Parabéns