LanguageWhisperer: Facilitate your language learning with Transformers!
We live in a world full of different objects, images and languages. However not everyone is well versed with learning new languages. But what if we want to learn a foreign language? This is where our tool, the LanguageWhisperer comes in.
HOW DID WE GET HERE?
On Friday 26th May, 2023, Lily Kerns, a Community Manager in the AWS Community Builder program announced our first ever Community Builders Hackathon! This is a fantastic opportunity for all of us to learn, be creative, and create something remarkable. 🥳
Project Title: Language Whisperer
Language Whisperer as the name implies is a simple-to-use application that enables you to translate and learn a new language through images. Think of Language Whisperer as your primary application whenever you encounter an unfamiliar object in any given situation. This application will assist you in gaining knowledge and understanding about the object in question, regardless of your prior familiarity with it.
This application can be used in different scenarios such as education, entertainment, tourism etc.
When visiting a foreign country and expressing a desire to acquire knowledge of its language, it would be advantageous to promptly access a comprehensive vocabulary for a specific location. Manually searching for numerous words would consume a considerable amount of time. However, with the LanguageWhisperer, efficiency is achieved. By capturing an image, multiple words can be translated simultaneously, enabling users to listen to the translations and gain deeper insights into their meanings.
During the initial stages of the project, our team engaged in brainstorming sessions to explore various ideas on how to leverage Transformers. We finally selected her concept, the Language Whisperer, for our project. To facilitate collaborative development, @dashapetr created a Google Colab notebook where team members could experiment and iterate on the initial codebase. As the project progressed, we transitioned to using GitHub, to streamline collaboration and ensure efficient code management among team members. This transition allowed for smoother coordination and enhanced teamwork throughout the development process. The following tools were used:
- StarCoder agent, an extensive language model (LLM), offers a natural language API built on top of transformers and has been employed for the purpose of implementation. Detailed documentation for the agent can be found at the provided link https://huggingface.co/docs/transformers/main_classes/agent.
- Wiki Searcher (a custom tool) was implemented by utilizing BeautifulSoup, a Python library designed for extracting data from HTML and XML files. This library played a crucial role in parsing and navigating the HTML structure of web pages, enabling the extraction of relevant information for the Wiki Searcher application.
- gTTS (Google Text-to-Speech) library (a custom tool) was used for converting text into high-quality speech output with natural-sounding voices. This decision stems from the observation that the default translator within the agent does not meet our desired level of effectiveness when it comes to accurately reading text in various languages.
- Streamlit was used for the Frontend
Finding the appropriate voice for the task proved to be quite challenging. Unfortunately, the built-in StarCoder text-to-speech tool rendered foreign phrases in English with a noticeable accent, causing confusion. Additionally, a decision was made to conduct research in order to find a suitable solution. One option considered was the utilization of Amazon Polly, although integrating it with streamlit presented difficulties, as it necessitated authorization to an AWS account. Alternatively, the gtts library offered a viable option, requiring no keys or access and easily installable via pip install. It simply required the addition of a language code as input, yielding natural-sounding voice output.
One of the challenges we encountered was determining the appropriate front-end stack for our machine learning application. Initially, we embarked on building a Next.js React application with Python APIs. However, in an effort to simplify the process, we made the decision to utilize Next.js embedded APIs instead of deploying lambda functions and an API gateway. Unfortunately, this decision led to significant issues due to dependencies. We found ourselves needing to containerize the Python library dependencies. Considering the urgency of implementing our idea as quickly as possible for a quick proof of concept, we altered our approach and opted to implement the user interface using Streamlit.
With Language Whisperer, all you have to do is
- take a picture
- upload the image you would want to transcribe
- select your preferred choice of language
- play and listen to the transcribed language
E.g: Let us have a look at a sample image:
Step 1: Image Analysis - Language Whisperer receives this image and generates the following
Step 2: Image Caption: 'a plate of food with eggs, hams, and toast'
Step 3: Translate the caption into a language of your choice: E.g Spanish
Step 4: Learn/Read the caption: un plato de comida con huevos, jamones y tostadas
Step 5: Search for a word meaning in Wiki: (comida) –> food, something edible <…>
- Github Repository: https://github.com/RonakReyhani/LanguageWhisperer
- Demo Video: https://youtu.be/zaYRAKcPHOk
Lesson Learnt (New Skill Developed)
- Use of Session States
- Use of Amazon Polly
- Use of Streamlit
@anja: As I didn’t have any experience with the Transformers library before the Hackathon, I was hoping to be able to contribute enough anyway. Luckily, the documentation is very beginner friendly even for people that aren’t experienced with Machine Learning yet. I have been reminded that you should never be afraid to try new tech tools, often it’s not as difficult as you think. I will definitely dive more into Machine Learning in the future. Also it was the first Hackathon I participated in, it was awesome to work together with my brilliant teammates.
@ronakreyhani: I have always harboured a deep passion for machine learning (ML), which makes every new concept or topic in the field incredibly enticing. Recently, my curiosity led me to explore the realm of Generative AI, specifically the renowned Hugging Face LLM models. Although I had heard about them in passing, I had never had the opportunity to delve into their intricacies. This project presented a remarkable chance to step out of my comfort zone and venture into the unknown.
Throughout this journey, I gained extensive knowledge about various aspects of the Hugging Face ecosystem, including Hugging Face Hub, models, transformers, pipelines, and the widely acclaimed "agent" that our app heavily relies on. Beyond the technical advancements, what truly made this experience exceptional was the opportunity to collaborate with an extraordinary team spanning across the globe. Through virtual meetings and vibrant discussions, we pooled our ideas and arrived at a common understanding. Working with them was truly inspiring, and their unwavering support allowed me the freedom to implement my ideas using the tools I was most comfortable with.
As the hackathon reached its conclusion, I not only acquired a wealth of knowledge about LLM models and Hugging Face agents, but I also forged incredible friendships. The prospect of meeting these newfound friends in person fills me with anticipation and excitement. In retrospect, this sense of camaraderie and connection stands as the greatest achievement of this endeavour.
@dashapetr: As a Data Scientist, I had a bit of experience with Transformers, but I haven’t used tools and agents. I found the hackathon idea very interesting because I saw the huge tools' potential. Choosing the concept was quite challenging; I had several of them, but when the LanguageWhisperer came to my mind, I was so excited that I decided to put aside all the rest of the ideas. The LanguageWhisperer is something I wish existed when I was struggling to learn French and Chinese. I am grateful that my team decided to go with this idea, and I am extremely happy to get to know my fellow female builders better; it’s an enormous pleasure to build together!
@chinwee__o: One of the many things that stood out for me was employing different alternatives available needed to get the work done. One of which was trying out Amazon Polly for the purpose of this project. While transformer had a text-to-speech agent which could have been implemented in this project, however the outcome did not produce the best result. This further buttressed that these alternatives are available in order to meet a specific need if others fail. 1 month, 4 weeks, 29 days, 696 hours, 41760 minutes, 2505600 seconds and every single meeting, conversations, chat, lines of code, new learnings with these ladies was worth the while.
We believe that the LanguageWhisperer can be extended and improved.
- Firstly, its functionality can be expanded to facilitate the comparison of translations in multiple languages, generate illustrative usage examples for a given word, and provide a feature to "listen" to and rectify the user's pronunciation.
- Secondly, LanguageWhisperer has the potential to be transformed into a mobile application, enabling users to access it conveniently from any location.