Barri

Posted on Feb 5, 2024

Text to speech with Javascript

#javascript #programming #beginners #tutorial

Text-to-speech (TTS) is an assistive technology that has gained much popularity over the years. It is also called 'read aloud' technology. This is because TTS converts written words or text into speech by reading them aloud.

TTS has been incorporated into many websites, apps and digital devices. It is a notable alternative to text, extending the reach of content and audience. It is being used today not just as an alternative for text, but also for educational purposes.

Though written text still reigns supreme, TTS popularity is largely based on the advantages it offers over static text.

It helps people with reading difficulties
It is convenient
It helps people with alternative learning styles
It is accessible.

How TTS works

Majority of TTS are inbuilt. The browser, app or software have inbuilt tools to convert text to speech. Example, Google Doc has an accessibility setting where readers can choose to 'Turn on screen reader support'.

Some TTS are softwares downloaded to a device and enabled on a browser or page where it is needed. This works primarily for pages or apps without an inbuilt TTS.

TTS are applied in different forms. Some highlight words as they are read. Some have options like start, stop, pause and cancel, giving the reader more control over how it is used. They are also options to change the voices of the reader.

For the sake of this article, we will look at text-to-speech API on websites using JavaScript.

Why JavaScript?

JavaScript is a modern programming language that extensively covers all web-related technology. It is also called the language of the web.
Javascript, fused with HTML5 has a broad reach of DOMs and APIs that make it easier to write some functionality into a website, including a text-to-speech functionality which uses the Web Speech API.

Web Speech API

Web speech API is an API that allows us to incorporate voice data or speech into web apps. It provides two distinct functionalities - SpeechSynthesis (text-to-speech) and Speech Recognition.

SpeechSynthesis is the synthesizer that allows apps to read text aloud from a device or app. It is the control interface of Web Speech API text-to-speech service.

Speech recognition is different from text-to-speech because it involves giving voice commands to the application.

Getting Started with SpeechSynthesis

The speechsynthesis functionality is a robust controller with properties and methods that regulates how text is converted to speech. To convert text-to-speech, we only need to create an instance of the speechsynthesisutterance class and configure it with the properties and methods attached to it.

let speech = new SpeechSynthesisUtterance();

Speechsynthesis has six properties, they include,

Language: This gets and sets the language of the utterance.
Pitch: Sets the pitch at which the utterance will be spoken at. It ranges from 0 - 2. 0 being the lowest and 2 being the highest. We can adjust it using a slider.
Rate: Sets the rate at which the utterance will be spoken at. The rate ranges from 0.1 to 10, with 0.1 being the lowest and 10 being the highest. Visually, we can set it using a slider.
Volume: Sets the volume at which the utterance will be spoken. The volume is a range from 0 to 1. 0 being lowest and 1 being the highest. We will set it visually using a slider.
Text: Gets and sets the text that will be synthesized.
Voice: Sets the voice that will be spoken.

It takes methods like
.cancel(): This is like a stop. It removes all the utterances from the utterance queue.
.getvoices(): This gets the voices available on the Web Speech API synthesizer.
.pause(): This pauses an utterance
.resume(): This is fired when an utterance is paused.
.speak(): This reads an utterance aloud.

To simply convert a text to speech

<script>
let speaknow = new SpeechSynthesisUtterance('Hello world!');
window.speechSynthesis.speak(speaknow);

</script>

To check for browser support, since not all browsers support the API

<html>
 <body>

   <button onclick="play()">Play</button>

</body>
</html>


<script
 function play() {
      if ('speechSynthesis' in window) {
        let working = new SpeechSynthesisUtterance("This is working");
        window.speechSynthesis.speak(working);
      }
   else{
     document.write("Browser not supported")
   }
    }
</script>

Next, we will create a simple demo with HTML, CSS and JS to show how Web Speech API can be implemented in browsers and websites.

<html>
<head>
    <meta charset="utf-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <title>Web API TTS</title>
    <meta name="viewport" content="width=device-width, initial-scale=1">

</head>
<body>
  <div>
  <h3> Select Voices </h3>
  <select id = "voices" >
    <option> option 1 </option>
  </select>
  </div>

  <div id = "vpr">
    <h5> Volume </h5>
  <input type="range" min="0" max="1" value="0.5" step="0.1" id="volume" />
        <span id="vol-label">1</span>

    <h5> Rate </h5>
  <input type="range" min="0" max="10" value="0.5" step="0.1" id="rate" />
        <span id="rate-lab">1</span>

    <h5> Pitch </h5>
  <input type="range" min="0" max="2" value="1" step="0.1" id="pitch" />
        <span id="pitch-lab">0.5</span>
  </div>

  <textarea rows = "9" cols = "60" name = "description" id = "lines">
            Enter text here...
         </textarea><br>

    <button class = "buttons" style = "background: green;" id = "speak"> Speak </button>
    <button class = "buttons" style = "background: orange" id = "pause"> Pause </button>
    <button class = "buttons" style = "background: lightgreen" id = 'resume'>Resume </button>
    <button class = "buttons" style = "background: red" id = 'cancel'> Cancel </button>

</body>
</html>

CSS

html, body{
    height: 100%
  }

select{
  padding: 3px;
  margin: 10px 0;
}

#vpr {
  display:inline-block;
  padding: 30px 10px;
}

.buttons{
  display: inline-block;
  padding: 0.6em 1.5em;
  margin: 0 0.3em 0.3em 0;
  border-radius: 5px;
  box-sizing: border-box;
  font-family: 'Roboto', sans-serif;
  font-weight: 400;
  font-size: 14px;
  color: black;
  text-align: center;

}

// First we initialize new SpeechSynthesisUtterance object
let tts = new SpeechSynthesisUtterance();

// Setting the Speech Language
tts.lang = "en";

//Populating the select dropdown with the list of available voices on Web Speech API

let speechvoices = []; // global array of available voices

window.speechSynthesis.onvoiceschanged = () => {
  // To get the list of voices using getVoices() function
  speechvoices = window.speechSynthesis.getVoices();

  // We need to populate the section and set the first voice
  tts.voice = speechvoices[0];

  let select_voice = document.getElementById("voices");
  speechvoices.forEach((voice, i) => (select_voice.options[i] = new Option(voice.name, i)));
};

//SETTING THE CONTROLS - SPEAK, PLAY, PAUSE AND RESUME

//SPEAK
//first we get the value of the textarea or document
document.getElementById("speak").addEventListener("click", () => {
  tts.text = document.getElementById("lines").value;

  //then we implement the speechsynthesis instance
  window.speechSynthesis.speak(tts);
});


//PAUSE
document.getElementById("pause").addEventListener("click", () => {
  // Pause the speechSynthesis instance
  window.speechSynthesis.pause();
});

//RESUME
document.getElementById("resume").addEventListener("click", () => {
  // Resume the paused speechSynthesis instance
  window.speechSynthesis.resume();

  });

//CANCEL
document.querySelector("cancel").addEventListener("click", () => {
 // Cancel the speechSynthesis instance
  window.speechSynthesis.cancel();
});

//TO SET THE VOLUME, PITCH, AND RATE

 //Volume 

  //We get the volume value from the input
  document.getDocumentById("volume").addEventListener("input", () => {
  const vol = document.getDocumentById("volume").value;

  // Set volume property of the SpeechSynthesisUtterance instance
  tts.volume = vol;

  // Updating the volume label
  document.querySelector("#vol-label").innerHTML = vol;
});


  //RATE
   // We get the rate Value from the input
document.getDocumentById("rate").addEventListener("input", () => {
  const rate = document.getDocumentById("rate").value;

  // Set rate property of the SpeechSynthesisUtterance instance
  tts.rate = rate;

  // Updating the rate label
  document.getDocumentById("rate-lab").innerHTML = rate;
});


//PITCH
  // We get the pitch Value from the input
document.getElementById("pitch").addEventListener("input", () => {

  const pitch = document.getElementById("pitch").value;

  // Setting thepitch property of the SpeechSynthesisUtterance instance
  tts.pitch = pitch;
  // Updating the pitch label
  document.getDocumentById("pitch-lab").innerHTML = pitch;
});

Although we have populated the voices in the drop down, they won't change to the selected voice unless we use the onchange function to target that.

// This changes the voice of the speaker or utterance to the selected voice
document.getDocumentById("voices").addEventListener("change", () => {
  tts.voice = voices[document.getDocumentById("voices").value];
});

Browser Compatibility

Web API SpeechSynthesis enjoys full support of Chrome, Edge, Firefox, Opera and Safari. Internet Explorer does not support this API. The onvoiceschanged() method is the only method not supported by Safari and Opera.

DEV Community