Lenmor Ld

Posted on Dec 14, 2019

🗣 Web Reader using Web Speech API

#showdev #texttospeech #webdev #javascript

Demo here: https://stupefied-curran-2254b8.netlify.com/

Have you ever been TLDR (Too LAZY didn't read) to read an online article or any webpage of some sort...
and wished that your browser would read it for you?

Well, you're in luck! I built a Web Page Reader. 😆
Just copy-paste a URL or some text in the input and it would read it for you!
Well, the readable parts at least 😅

💬 Web Speech API

I used Speech Synthesis from native browser's Web Speech API.
It is an experimental tech, but good chance you have this in your browser now!

Actually, we all had this since Chrome 33, Firefox 49, Edge 14. But check here in case you are using a tamagochi 🐰: caniuse Web Speech API.

The speech inputs

User inputs are the following HTML elements:

textarea for the URL/text to read
select input for the voice
range inputs for pitch and rate

The textarea contents are checked if it's a plain text or a URL.

The rate (how fast the speaking goes) ranges from 0.5 to 2.
The pitch (highness or lowness of voice) ranges from 0 to 2.
The voice select provides the voices available from the system.

🎤 `SpeechSynthesisVoice`

The voices available differs for every device, and is obtained via
speechSynthesisInstance.getVoices().

This returns all the SpeechSynthesisVoice objects, which we stuff on the select options.

User selects one of this, or leave the default.

Now, what makes the browser actually talk is the SpeechSynthesisUtterance object.

🗣 `SpeechSynthesisUtterance`

A SpeechSynthesisUtterance object (utterance) is like an individual speech request, which we initialize with the string and attach all the speech elements like voice, rate and pitch.

Finally, trigger the utterance via speechSynthesis.speak().

A finishUtteranceCallback is also supplied to enable play button and other controls when the text is finished.

This logic is encapsulated in speak(string, voice, pitch, rate, finishUtteranceCallback)

  speak(string, voice, pitch, rate, finishUtteranceCallback) {
    if (this.synth.speaking) {
      console.error('🗣 already speaking');
      return;
    }

    if (string) {
      const utterance = new SpeechSynthesisUtterance(string);
      utterance.onend = () => {
        console.log('utterance end');

        finishUtteranceCallback();
      };
      utterance.voice = voice;
      utterance.pitch = pitch;
      utterance.rate = rate;

      this.synth.speak(utterance);
    }
  }

All of this functionality is wrapped in a WebSpeechApi to keep it modular. 📦

For a detailed look at Speech Utterance, check this out: MDN Speech Utterance.

This MDN page has an awesome rundown and example where I built my app off of. Please check it out too!

🌐 URL check

User can input URL or text on the textarea to read.
But how does this detect if it's a URL?
A simple try-catch does the trick.

// simple check if valid URL
try {
    new URL(urlOrText);
    isUrl = true;
} catch (error) {
    // not a URL, treat as string
    isUrl = false;
}

If it's a plain text, this is passed directly to the speak().
If it is indeed a URL, GET request to load the page and scrape the readable elements.

🕷️ Web page scraping using `cheerio` and `axios`

cheerio is a subset of jQuery that is super fast, easy and flexible for parsing HTML.
(Seriously it's as easy as cheerio.load(<p>some html</p>))

axios is a Promise-based client for fetching stuff from APIs, and in this case, getting the full HTTP get response from a webpage.

Combined, this is how I'm getting all the "readable" elements of a page.

const getWebsiteTexts = siteUrl => new Promise((resolve, reject) => {
  axios
    .get(siteUrl)
    .then((result) => {
      const $ = cheerio.load(result.data);
      const contents = $('p, h1, h2, h3').contents(); // get all "readable" element contents

      const texts = contents
        .toArray()
        .map(p => p.data && p.data.trim())
        .filter(p => p);

      resolve(texts);
    })
    .catch((err) => {
      // handle err
      const errorObj = err.toJSON();
      alert(`${errorObj.message} on ${errorObj.config.url}\nPlease try a different website`);
      urlOrTextInput.value = '';
      finishUtterance();
    });
});

Some URLs error out so we catch the error, alert() user, clear the textarea and reset the form inputs.

Why some URL doesn't work?

⛔ CORS policy

The scraper can't parse all websites out there.
In fact, a lot of websites (try Medium articles) has a CORS policy.
So you'll get an error like this in some websites.

CORS policy: No 'Access-Control-Allow-Origin' means only the Same Origin can do GET requests from a webapp script.

Note that cURL and Postman may still work on these sites, just not from Javascript like this.

This is enabled from the server of the site we're trying to read, so nothing much we can do but move on to a different page. 😢

Here's a good rundown of CORS:

Understanding CORS

Martin Splitt ・ Nov 12 '19

#cors #javascript #security #webdev

dev.to pages work though! Try it out 🎉.
Thank you dev.to for allowing us to scrape 🙏

▶️ play, pause, restart

Lastly, I added some basic playback control.

Here's the play function that starts or resume based on current paused status of the speechSyntesis. The other controls are just disabled except pause and stop.

playButton.addEventListener('click', () => {
  if (speechApi.synth.paused) {
    speechApi.synth.resume();
  } else {
    // start from beginning
    read();
  }

  playButton.disabled = true;
  pauseButton.disabled = false;
  stopButton.disabled = false;

  rateSlider.disabled = true;
  pitchSlider.disabled = true;
  voiceSelect.disabled = true;

  urlOrTextInput.disabled = true;
});