José Thomaz

Posted on Jun 21, 2022

Introduction to Text to Speech and Speech Recognition using React-native

#reactnative #tutorial #computerscience #programming

As technology advances, new problems come, so, new solutions are developed. Text and Voice solutions were created to help people with some type of disability, also to control devices using voice, to make translators, speech synthesizers, virtual assistants and many other use cases.

What are the types of voice technologies?

Speech recognition
Text-to-speech (speech synthesis)

These technologies use Natural Language Processing (NLP), a sub-field of artificial intelligence that consists of processing and manipulation of human language at different levels.

Speech recognition

Speech recognition is a sub-field of computer science that develops technologies that enable the recognition and translation of human language into text.

Speech recognition examples

Siri: the virtual assistant from IOS;
"Ok Google": when you search for something at Google just by using your voice;
Automatically generated closed-caption subtitles for people with hearing impairments;
Hands-free computing: control devices, apps and websites without using your hands.

Text-to-speech

Text-to-speech is the reverse of speech recognition, also known as speech synthesis, it is the artificial production of human speech. This means that, from a given text input, the computer should output an audio speaking the content of the input.

Text-to-speech examples

Multimodal interfaces: Stephen Hawking used text-to-speech for many years of his life to communicate.
Apps accessibility: Describe screen elements through text-to-speech for visually impaired people.
Translators

Implementing Text-to-Speech in React-native

So, let's begin! For this implementation I will be using Expo, so first of all check if it is installed in your machine by using the following command:

expo --version

If it returns an error, you can install following this guide.

After that, let's create a React-native project, I will be using a Typescript template:

expo init mobile --template expo-template-blank-typescript

Setting up React-navigation

First let's install @react-navigation library and its dependencies:

yarn add @react-navigation/native

react-native-screens and react-native-safe-area-contextare required for @react-navigation to work:

expo install react-native-screens react-native-safe-area-context

Now we just need to install the bottom-tabs navigator:

yarn add @react-navigation/bottom-tabs

To maintain the project organized, let's create an src folder, with screens and componentssubfolders. The structure should be like this:

src
|-----screens
|-----components
App.tsx

Our navigator will be inside the components folder, so create the file TabNav.tsx.

import { createBottomTabNavigator } from '@react-navigation/bottom-tabs';
import { Text } from 'react-native';

const Tab = createBottomTabNavigator();

export default function TabNav() {
  return (
    <Tab.Navigator
      screenOptions={
        {
          headerShown: false,
          tabBarIconStyle: {
            display: 'none',
          }
        }
      }
    >
      <Tab.Screen name="TTS" component={() => <Text>TTS SCREEN</Text>} />
      <Tab.Screen name="ASR" component={() => <Text>ASR SCREEN</Text>} />
    </Tab.Navigator>
  );
}

The app will have only two screens, one for Text-to-Speech and other for Speech-Recognition. For now let's render only a Text component with the screen name.

Text-to-Speech screen

With the react-navigation configuration done, we can start to implement text-to-speech. As we are using expo, we can download Expo's official library for TTS (Text-to-Speech).

yarn add expo-speech

This is a very simple library, to use it let's create a file named text-to-speech.screen.tsx where we will put the code for TTS.

import React, { useState } from "react";
import { Button, TextInput, View, StyleSheet } from "react-native";
import * as Speech from 'expo-speech';

export default function TextToSpeechScreen(): JSX.Element {
  const [iptValue, setIptValue] = useState<string>('');

  function speak (): void {
    const thingToSay = iptValue;
    Speech.speak(thingToSay, {
      language: 'pt-BR',
    });
  };

  return (
    <View style={styles.container}>
      <TextInput style={styles.input} placeholder="Type something..." value={iptValue} onChangeText={(text) => setIptValue(text)} />
      <Button title="Listen" onPress={speak} />
    </View>
  );
}

const styles = StyleSheet.create({
  container: {
    flex: 1,
    justifyContent: 'center',
    backgroundColor: '#F5FCFF',
    padding: 8,
  },
  input: {
    height: 40,
    borderColor: 'gray',
    borderWidth: 1,
    margin: 10,
    padding: 10,
  }
});

It's a very simple screen, just a TextInput and a Button centralized with some basic styles. You type something in the input and then when you hit the button, an action is triggered, so the function speak is called.

Speech Recognition screen

To use speech recognition with React-native, there is a library called @react-native-voice/voice, so let's add it to our project.

yarn add @react-native-voice/voice

This library was initially designed for react-native-cli projects, so to use it with Expo, we need to do some adaptations. So, open your app.json file and add the following content inside the "expo" property:

  "plugins": [
    [
      "@react-native-voice/voice",
      {
        "microphonePermission": "Allow $(PRODUCT_NAME) to access your microphone",
        "speechRecogntionPermission": "Allow $(PRODUCT_NAME) to securely recognize user speech"
      }
    ]
  ]

Now, let's create a file named speech-recognition.screen.tsx, this file will have the code to the screen implementing Speech Recognition. Add the following code to the file.

import React, { useState, useEffect } from "react";
import { Button, StyleSheet, Text, View } from "react-native";
import Voice, {
  SpeechResultsEvent,
  SpeechErrorEvent,
} from "@react-native-voice/voice";

export default function SpeechRecognitionScreen() {
  const [results, setResults] = useState<string[]>([]);
  const [isListening, setIsListening] = useState(false);

  useEffect(() => {
    function onSpeechResults(e: SpeechResultsEvent) {
      setResults(e.value ?? []);
    }
    function onSpeechError(e: SpeechErrorEvent) {
      console.error(e);
    }
    Voice.onSpeechError = onSpeechError;
    Voice.onSpeechResults = onSpeechResults;
    return function cleanup() {
      Voice.destroy().then(Voice.removeAllListeners);
    };
  }, []);

  async function toggleListening() {
    try {
      if (isListening) {
        await Voice.stop();
        setIsListening(false);
      } else {
        setResults([]);
        await Voice.start("en-US");
        setIsListening(true);
      }
    } catch (e) {
      console.error(e);
    }
  }

  return (
    <View style={styles.container}>
      <Text>Press the button and start speaking.</Text>
      <Button
        title={isListening ? "Stop Recognizing" : "Start Recognizing"}
        onPress={toggleListening}
      />
      <Text>Results:</Text>
      {results.map((result, index) => {
        return <Text key={`result-${index}`}>{result}</Text>;
      })}
    </View>
  );
}

const styles = StyleSheet.create({
  container: {
    flex: 1,
    justifyContent: "center",
    alignItems: "center",
    backgroundColor: "#F5FCFF",
  },
});

The code above is very simple, we import what we will use from the @react-native-voice/voice package. After that, a function is created to start listening when the button is hit by the user.

Before we test, let's just update our tab navigation file to render the screens we have just created.

Firstly, add the following imports on the top of the file:

import SpeechRecognitionScreen from '../screens/speech-recognition.screen';
import TextToSpeechScreen from '../screens/text-to-speech.screen';

And lastly, replace the content of the lines 10 and 11, for the following:

<Tab.Screen name="TTS" component={TextToSpeechScreen} />
<Tab.Screen name="ASR" component={SpeechRecognitionScreen} />

Conclusion

Our app is done, if you did everything correctly, your app should be something like that:

Also, feel free to check the code at github, and also contribute with new features if you want: repository link.

So, that's it! Text-to-Speech and Speech-Recognition are very useful technologies, that can be combined and integrated to many types of systems, including mobile apps. I know that the app isn't the most beautiful, but this is because this article is more focused on presenting TTS and ASR.