DEV Community

loading...
Cover image for UmaVoice - The beginning

UmaVoice - The beginning

matluz profile image matluz ・3 min read

Probably the first post I've ever made here which does not include the #help tag.

Today I'm starting building the UmaVoice project. This is something I've been thinking for a while, and I'm going to use the #gftwhackathon opportunity to develop it. The project will not be fully focused on the Web Monetization API, but I think it can be interesting enough to give his users a chance to start participating in this new kind of web payment.

The Project

UmaVoice is an English Pronunciation Learning Platform, where the goal is to help the user with the pronounce of specific words.

Pronunciation is a big topic and results may not be able to be 100% precise since there is a lot of English accents who can change from region to region, but I believe this type of technology can be a big help, mainly for countries where the native language phonemes are too different from the English ones or even don't exist.

The goal

For now, I am aiming to build something like this.

Project Main Page
(Prototype)

How it Works

When using the platform, the user will choose to get a random sentence or add a specific one. With the Microphone button, the user can Start/Stop recording, and the button with a hearing icon can help the user to understand how the sentence is spelled using Text to Speech.

When reaching three consecutive correct results a modal will appear, congratulating the user and a new sentence can take place. The maximum number of results in the screen is three, and if there are no three consecutive answers, the last result will be replaced in the queue order.

Input Validation

When a user finishes recording some audio, it is going to be validated through a Speech to Text technology, and with that, we can get the level of the 'confidence' for a text. If the Speech to Text returns the desired sentence and it got a good confidence level, it will return true.

Different types of User's account

There are two kinds of Speech to Text technology who will be used in this project:

Free Account

Users with a free account can access the platform content, but it is restricted to browser who support SpeechRecognition technology.

Premium Account

Users with a premium account will be able to access the content through a broader diversity since the Speech to Text will be done in the backend with the DeepSpeech technology.

The Web Monetization API will be the only way to check a premium account, so I'm going to add a section to educate the user about this kind of payment and add instructions to show the steps necessary to do the payment and sign in with a premium account.

Final thoughts

For now, I'm going to start with a 'Mobile First' approach, and think later how it will look like in a desktop environment. There are still some details I could add here, but I think this is a good start for now.

Here is the Github link, for anyone interested in following the development process:
https://github.com/matluz/umavoice

Discussion

pic
Editor guide
Collapse
cyberdees profile image
☞ Desigan Chinniah ☜

Do have a look at the Common Voice project started by the folks at Mozilla. It will likely give you some inspiration on how they tackle community involvement on both the 'Speak' and 'Listen' engagement modes.

Collapse
matluz profile image
matluz Author

Yes! Common Voice is an interesting project and I believe some DeepSpeech models were trained using the data gathered in this project.

For now, I will stay with the Speech to Text approach only, since it can give a instant feedback to the user. But I do find Common Voice interesting, and later on, I think an approach for users helping each other with a learning purpose would be a nice feature to add to the UmaVoice project.