DEV Community

Cover image for VisuSpeak β€” π™‘π™žπ™¨π™ͺπ™–π™‘π™žπ™―π™š 𝙩𝙀 π™Žπ™₯π™šπ™–π™  πŸ‘€πŸ—£οΈ
Pratyay Banerjee
Pratyay Banerjee

Posted on • Updated on

VisuSpeak β€” π™‘π™žπ™¨π™ͺπ™–π™‘π™žπ™―π™š 𝙩𝙀 π™Žπ™₯π™šπ™–π™  πŸ‘€πŸ—£οΈ

This is a submission for Twilio Challenge v24.06.12

Additional Prize Categories:

  • Twilio Times Two πŸš€ : VisuSpeak leverages multiple (4+) Twilio APIs / services, making it highly scalable & future proof.
  • Impactful Innovators πŸ’ͺ : Our project was carefully brainstormed and validated by successful startup founders and online creators who saw its potential for high user engagement.

NOTE: Thanks #Twilio and #DEV for hosting this challenge. We're extremely delighted to share this project with you all. Twilio undoubtedly provides huge suite of tools & APIs to build cool stuff for the better, and this time, instead of building something common, say a whatsapp ai-chatbot, reminders, basic note-takers and summerizers, we really wanted to build something great, something innovative, which can light up the spark in our daily lives! We hope you all will enjoy the project! πŸ˜ƒ


Problem Statement 🚨

Computer-mediated communication platforms such as Google Meet, Zoom, and Microsoft Teams, etc. have become increasingly popular for facilitating verbal communication. These platforms provide features such as live captioning and noise cancellation to enhance understanding. However, we propose that visual augmentations that leverage the semantics of spoken language can further improve the conveyance of complex, nuanced, and unfamiliar information. People already use visual aids in daily conversations to provide additional context and clarification. Research shows that people learn more effectively from videos than audio alone and prefer visuals in podcasts and stories. The Multi-modal Phenomena and the Principle of Inverse Effectiveness provide evidence that the human sensory system responds more effectively to stimuli from multiple simultaneous modalities. Therefore, this project aims to develop and integrate visual augmentation tools into computer-mediated communication platforms to enhance the understanding and effectiveness of verbal communication.

What we built πŸ€”

VisuSpeak is a revolutionary video conferencing application that generates visual representations of spoken content in real-time, augmenting these visuals directly into your video feed to enhance understanding and effectiveness of verbal communication.

image

Demo Video (v2) ▢️

Here you can access the v1 of project video!

Twilio and AI

VisuSpeak leverages powerful bleeding-edge technologies to ensure a seamless experience. The front-end is built on top of NEXT.js & shadcn. Besides, We've integrated,

image

This ensures that the visuals you see are always in sync with the conversation. Apart from these, we're using,

To enhance output, we detect and analyze user emotions from conversation transcription, we ensure that the visual output aligns with the intended emotional context. VisuSpeak also offers an exotic toggle feature that allows the system to capture screenshots at regular intervals & poll them to backend for improved visual output with GPT (if toggled on).

Privacy & Security πŸ”

VisuSpeak deals with a wide range of sensitive information. In the wrong hands, this data could dramatically harm individuals.

A preview of Magic ✨
Image

We took special efforts and considerations to ensure that our platform being End-to-End encrypted, protects the privacy and sensitive information of all of our users making it 100% GDPR compliant!

Background πŸ“œ

Have you ever been in a situation where background noise makes it impossible to hear a conversation? Or perhaps you're in a place where you can't turn up the volume. Maybe you're hard of hearing and struggle to follow along in meetings. Or, perhaps, you're trying to communicate in a language that isn’t your own, making it even harder to understand and be understood. Communication should be clear and accessible to everyone, no matter the circumstances.

These scenarios are more common than you think. And for those who are deaf or hard of hearing, this is an everyday challenge.

We believe that with creative thinking, there's a way to bridge this multilingual communication gap effortlessly. That's why we built VisuSpeak ✨

image

Design 🎨

We were heavily inspired by the revised version of Double-Diamond design process, which not only includes visual design, but a full-fledged research cycle in which you must discover and define your problem before tackling your solution & then finally deploy it.

Double Diamond

  1. Discover: a deep dive into the problem we are trying to solve.
  2. Define: synthesizing the information from the discovery phase into a problem definition.
  3. Develop: think up solutions to the problem.
  4. Deliver: pick the best solution and build that.

Moreover, we utilized design tools like Figma, Photoshop & Illustrator to prototype our designs before doing any coding. Through this, we are able to get iterative feedback so that we spend less time re-writing code.

Challenges we ran into 😀

We take challenges as an opportunity to explore and learn new things so that we can build cool stuff! But yeah, being in 2 different time zones (India, and Germany) and coordinating while shipping out our product, was definitely a challenge. Sleepless nights and ruthless commitment were our solution to this problem.

Accomplishments that we're proud of ✨

We are proud of finishing the project on time which seemed like a tough task as we started working on it quite late due to other commitments. We were also able to add most of the features that we envisioned for the app during idealization. And as always, working overnight was pretty fun! :)

This project was especially an achievement for us because the experience was very different from than in-person hackathons. We found that some parts were the same though - we went through heavy brainstorming and extensive research all to feel the sweet, sweet success of hitting the final pin on the board.

What's next? πŸš€

We believe that our App has great potential. We just really want this project to have a positive impact on people's lives! We would love to make its architecture robust & scalable so that user interaction increases to a great extent. The next feature update will allow users to opt-in to receive meeting notes summary via email or WhatsApp with our optional toggle. Additionally, we intend to continue improving the image caching & adding internationalization (i18n) on the website, as well as support for Sign-Language conversation on the go!

App Tryout / GitHub Project Link πŸ”—

🟠 VisuSpeak πŸ‘‰ https://visuspeak.co [Deployed on Vercel ☁️]
image

🟠 You can find our project's GitHub repo here!

Meet our Team πŸ‘₯

We're team Neutrino, consiting of Me (@neilblaze) & my other buddy / teammate: @subhamx

image

Footnote πŸ”·

I would like to take a moment to express my appreciation to my colleague, Subham (@subhamx), for the successful collaboration we had. The projects we developed together are just the beginning, and we believe they have the potential to revolutionize the way we communicate by improving the comprehension and efficacy of verbal communication. We aim to address and alleviate the communication lags that are commonly experienced in daily interactions.

LICENSE βš–οΈ

This project is licensed under the MIT License.

Top comments (5)

Collapse
 
enzoenrico profile image
Enzo Enrico

Congrats on the amazing project! Loved the context based gifs, it adds a touch of fun to the meeting

Collapse
 
neilblaze profile image
Pratyay Banerjee

Thanks, glad you liked it! ^_^

Collapse
 
markantony87 profile image
Anton

Loved it!

Collapse
 
neilblaze profile image
Pratyay Banerjee

Thanks! ^_^

Collapse
 
neilblaze profile image
Pratyay Banerjee

Also, special thanks to @anthonyjdella & @rishabk7 for the resources & guidance for the hackathon! Hope y'all will like it! πŸ™ŒπŸ»