Matt Coulter for CDK Patterns

Posted on Jul 11, 2020 • Edited on Jul 20, 2020

Open your content to the world by combining Amazon Translate and Polly - Natural language translations and voice synthesis

#serverless #aws #cdk #tutorial

This is a pattern that integrates the Amazon Polly service into an AWS Lambda Function so that you can synthesize text into speech using a serverless stack. It also integrates with Amazon Translate to allow you to choose the language for the spoken text.

Some Useful References:

Author	Link
Amazon Polly	Amazon Polly Site
Polly Pricing	Polly Pricing
Polly Permissions	Polly IAM Permissions
Amazon Translate	What Is Amazon Translate?
Translate Pricing	Translate Pricing
AWS Blogs	Giving your content a voice with the Newscaster speaking style from Amazon Polly
AWS Docs	Using Amazon Polly with Amazon Translate
Timothy Mugayi	Text-to-Speech: Build Apps That Talk With AWS Polly and Node.js
Philip Kiely	Text-To-Speech With AWS (Part 1)

Available Versions

What is Included In This Pattern?

After deployment you will have an API Gateway HTTP API configured where all traffic points to a Lambda Function that calls the Polly / Translate service.

API Gateway HTTP API

This is setup with basic settings where all traffic is routed to our Lambda Function

Lambda Function

Takes in whatever voice you want and whatever text you want, translates it to whatever language you want then sends it to the Polly service and returns an Audio stream

Testing The Pattern

After deployment in the deploy logs you will see the url for the API Gateway.

If you open that URL in chrome it will play an audio recording saying "To hear your own script, you need to include text in the message body of your restful request to the API Gateway"

For examples of the voice clips produced checkout the mp3 files in the recordings folder

You can customise this message based on how you call the url:

Changing the voice

You can pick from 3 voices "Matthew" (the default), "Joanna" or "Lupe". This is using the newsreader style of voice which AWS recently launched so it currently only supports these 3.

To change voices just add a query param onto your url like:

https://{api-url}/?voice=Lupe
https://{api-url}/?voice=Joanna
https://{api-url}/?voice=Matthew

Changing the language spoken

This Lambda Function is integrated with Amazon Translate so you can have Polly speak a variety of languages

To have Lupe speak Spanish just add the translateTo query param

https://{api-url}/?voice=Lupe&translateTo=es

If the text you are translating is in a language other than english you can use the translateFrom parameter

To understand what languages are possible please refer to the documentation

Changing the text

If you use a tool like Postman to send text in the body of a POST request to the url it will use Polly to synthesize your text

Calling Translate / Polly

Integrating our Lambda Function with these services was relatively straightforward.

First I needed to make sure the function had IAM permissions

// https://docs.aws.amazon.com/polly/latest/dg/api-permissions-reference.html
    // https://docs.aws.amazon.com/translate/latest/dg/translate-api-permissions-ref.html
    const pollyStatement = new iam.PolicyStatement({
      effect: iam.Effect.ALLOW,
      resources: ['*'],
      actions: [
        "translate:TranslateText",
        "polly:SynthesizeSpeech"
      ],
    });
    pollyLambda.addToRolePolicy(pollyStatement);

Then we can use translate and polly from the AWS SDK.

For Translate:

// If we passed in a translation language, use translate to do the translation
  if(translateTo !== translateFrom){
    const translate = new Translate();

    var translateParams = {
      Text: text,
      SourceLanguageCode: translateFrom,
      TargetLanguageCode: translateTo
    };

    let rawTranslation = await translate.translateText(translateParams).promise();
    text = rawTranslation.TranslatedText;
  }

For Polly:

// Use Polly to translate text into speech

  const polly = new Polly();

  const params = {
    OutputFormat: 'mp3',
    Engine:'neural',
    TextType:'ssml',
    Text: `<speak><amazon:domain name="news">${text}></amazon:domain></speak>`,
    VoiceId: voice,
  };

  let synthesis = await polly.synthesizeSpeech(params).promise();
  let audioStreamBuffer = Buffer.from(synthesis.AudioStream);

  return sendVoiceRes(200, audioStreamBuffer.toString('base64'));

DEV Community