Automatic Speech Recognition with AWS Lambda and Leopard

#100daysofcode #challenge #aws #serverless

Day 23 is the day to show how to run an ASR (Automatic Speech Recognition) with Picovoice Leopard Speech-to-Text and AWS Lambda.

Pre-requisites

Register for an AWS account. We use it to access Amazon Lambda and API Gateway services via the console.
Sign up for Picovoice Console to get your AccessKey for free. We will need an AccessKey to use Leopard Speech-to-Text engine.
Download Postman. We will use this tool for testing.

Setting up Lambda

In AWS Console search for Lambda and then go to the Functions tab.
Press Create a Function. Then set the function name and set the runtime to Python 3.9.
Download the zip file from the GitHub repository which contains the Lambda handler and packaged [pvleopard](https://pypi.org/project/pvleopard/) module.

The Lambda handler does the following:

Gets and parses the data from the request.
Saves the audio data into a temporary file.
Transcribes and cleans up the temporary file.
Returns the transcription.

Press on Upload from > .zip file and upload the zip file.
Once the function is uploaded, go to Configurations > General configuration tab. Press on Edit.
Set the memory limit to 512 MB and timeout to 30 seconds and press Save.
Go to Environment variables tab and press on Edit.
Add your AccessKey that you obtained from Picovoice Console and press Save.
Press on Copy ARN (button located in the top right corner) and save your function ARN. We will need to set up API Gateway later. You may also come back later to copy your function ARN.

Setting up API Gateway

In AWS Console, go to API Gateway, and go to REST API section. Press on Build to create a new Rest API.
Select Rest as the protocol, New API, and give a name to your API. Press Create API to create your API.
Press Actions > Create Method and select POST as your method. Select Lambda Function as the integration type, tick on Proxy Integration, and copy your Lambda function ARN saved from before.
Once created, go to Settings tab.
Scroll all the way down. In Binary Media Types press on Add Binary Media Type, add multipart/form-data and press on Save Changes.
Go back to Resources tab. Press on Actions > Deploy API. Set [New Stage] as the deployment stage, give it a stage name and press Deploy.

You will be redirected to Stages and your API URL will be shown. Now we can test our rest API.

Testing the API
Use Postman to send an audio file and get the transcription.

Copy and paste the invoke URL from API Gateway in the request URL section and set the request type to POST.
In the Body tab, set the data type to form-data. Set the key name to audio_file, set the type as File, and press on Select Files and select the audio file you want to transcribe (Leopard supports the following audio formats: FLAC, MP3, Ogg, Opus, Vorbis, WAV, and WebM).
Once the request goes through, the result will show below.

In AWS Console, go to CloudWatch > Log groups if you would like more details on a specific API request and Lambda invocation.

Limitations & Future Explorations
API Gateway has a 29-second timeout and 10MB payload size limit. Lambda has a 15-minute timeout and a 6MB payload size limit.

Later, you can try to overcome these limitations:

To transcribe audio files larger than 4.5MB, use S3 to get a pre-signed upload URL, upload your audio file, fetch it in Lambda, and process it.
To transcribe audio files greater than 2 minutes (longer processing time), use Asynchronous Lambda to ping and get the response when the transcription finishes.

Keep in mind this is a starter code, you can always modify the code, re-zip, and upload the zip file again to update your function.

Take a look at Leopard GitHub Repository or Leopard Docs Page to learn more about Leopard.

This post was originally published on Picovoice's Medium page.

DEV Community

Automatic Speech Recognition with AWS Lambda and Leopard

Top comments (0)

Read next

Google Axion: A New Leader in ARM Server Performance

Top 10 announcements from AWS re:Invent 2024 you need to know

My First Full-Stack Deployment with Docker and NGINX as Load Balancer

Building an Event-Driven Architecture for Content Embedding Generation with AWS Bedrock, DynamoDb, and AWS Batch