DEV Community

Cover image for Amazon Polly – Text-to-Speech
Serhiy Kozlov
Serhiy Kozlov

Posted on • Originally published at romexsoft.com

Amazon Polly – Text-to-Speech

Amazon Polly is one of the newer add-ons to the AWS cloud services portfolio. Originally, launched in 2016, this Text-to-Speech (TTS) tool recently received a host of new features. 

As the name suggests, Amazon Polly can effectively convert any written texts to human speech, allowing users to build voice-enabled products, apps, and services. Amazon uses the latest deep learning tech to empower Polly with superior functionality for synthesizing human speech so that it closely resembles the voice of a real human. 

What’s Cool about Amazon Polly? 

Neural Text-to-Speech (NTTS) functionality

Amazon Polly supports a dozen different languages and can re-create a wide array of natural-sounding voice timbres. Amazon Polly is well-articulated and can help you deliver top-notch voice output to your audiences. 

Flexible set-up and customization 

You can switch between different voices depending on your needs. The service supports various SSML tags and lexicons. Thus, you can effectively control different speech aspects such as volume, articulation, speed, etc. 

Robust API 

Amazon’s APIs are well documented and let you set up integrations in no time. 

Comprehensive programming language support 

 Amazon Polly supports all popular coding languages included in SDK and AWS Mobile SDK (iOS / Android). Polly also supports API HTTP.

Affordable Cost 

The service has a pay-per-use pricing model (offers a free tier). You can get a better sense of costs here

Niche Use Cases  

  • Audio content creation 
  • e-Learning 
  • Telephony



Two New Amazon Polly Features Worth of Your Attention 

Polly’s latest add-ons are Newscaster and Neural Text-To-Speech (NTTS).

  • Neural Text-to-Speech (NTTS) enables Amazon Polly to rapidly learn the difference in speech styles and imitate those. As of August 2019, Polly can speak in 11 different voices: 3 British accents and 8 American English accents. In total, Amazon Polly supports 29 languages and lets you use different voices in several of them.
  • Amazon Polly Newscaster closely mimics natural language patterns, so that media publishers can broadcast the new information and original reporting faster. 

This service is a popular tool among ‘big name’ news companies, as well as some popular language learning applications.  

For example: 

  • "The Globe and Mail" – a popular Canadian news publisher – was among Newscaster early adopters. 
  • “Gannett Co”., – a US media powerhouse that owns USA Today, along with several other regional papers, also uses Polly. 
  • Duolingo – a popular language learning app – uses Amazon Polly for rendering texts in different languages. 

The Romexsoft team has also used Amazon Polly in one of our recent projects for Trinity Audio. We asked Alex, our Java Team Lead to explain how we incorporate Amazon Polly. 

How Amazon Polly Powers Trinity Player 

"One of the products we have been working on is an audio player that a user can integrate into a web page, and translate all the text into audio (text-to-speech). The player uses Amazon Polly and it's neural net, in particular, to ‘read' the texts out loud in a pleasant voice. Or you can adjust the setting, and make the tone more dramatics with ‘breaking news' reading style.

Some of the cool Trinity Player features include the ability to translate texts into different languages, display advertising (a major part of our project), plus some additional perks. For instance, to effectively incorporate advertising we use speech marks. These let us estimate when the new sentence begins so that we can incorporate an audio ad without breaking up a sentence.

Since we are also using Amazon for a multitude of tasks (Translation, Polly, EC2, S3, and a bunch of other services), I always have to pay careful attention to my code quality. Or else a sloppy bug can eat up your entire testing budget in one blink :). 

Trinity Player is a cool product, but it’s more tricky than you might think. I mean, yeah, it looks like an audio player with a 70px UI, how complicated it can be? 

But you constantly need to solve a lot of challenging tasks from a technical standpoint. The product does not have SPA, React, Angular or any other fancy framework. This forces you to think out of the box and work with everything at hand:  OM, CSS selectors, postMessage, audio, NodeJS, DB (Redis, MySQL, memSQL, Presto ), CICD, testing, Docker, etc.

We also spend a great deal of time testing the app (unit, e2e).

You are feeling a great deal of responsibility for the product, your code, and the importance of testing because you are developing a product people want to use!"                                             

Alex (Romexsoft  Javascript Team Lead)

How Romexsoft Can Help You?

As an AWS partner, we provide both professional services (cloud migration, cloud solutions, consulting, and cloud-native application development) and managed AWS services, to reduce costs, improve security and boost operational efficiency

Get in touch with us today to schedule a free consulting session!

Originally published at Romexsoft Blog: https://www.romexsoft.com/blog/amazon-polly-text-to-speech/

Top comments (0)