I assembled a Whatsapp bot that, given audio, will use Google's Cloud Speech-to-Text to extract the text and send it back to the user. I figured that people are likely sending many more voice notes now that they're having longer conversations over text (I know I am). I know for me it would be nice if I didn't need to spend 5-10 minutes listening, or maybe it's not always comfortable for me to listen. This way I can easily get the voice note converted to text for quick and easy reading.
Feel free to connect to my sandbox by sending a WhatsApp message from your device to +1 415 523 8886 with code join terrible-shop
I built it using flask running on AWS Lambda. When a message comes in, the file containing the audio is fetched through twilio from an s3 bucket and sent to the Google Speech Recognition API. The result is returned to the flask server and the flask server then sends it back to the user.
It was interesting learning how the twilio API handles media, and the redirect to an s3 bucket was unexpected. Initially I was planning on just passing the twilio url to the Google API but because of the redirect it was necessary for me to download the file first and then send the file itself to the Google API.
Zappa was used to deploy the code to AWS Lambda and expose a public endpoint via API Gateway.
Used https://dev.to/apcelent/deploying-flask-on-aws-lambda-4k42 for lambda deployment
Used https://cloud.google.com/speech-to-text/docs/quickstart-client-libraries?authuser=1#client-libraries-install-python for recognition code
I had a lot of fun playing around with this project. Initially I was going to do fake news detection but it turned out that was quite a difficult problem to solve😂 I'm happy that I could still participate and learn with a different project 😊 Thanks for checking out my project and stay safe ❤️