DEV Community

José Thomaz
José Thomaz

Posted on

Introduction to Text to Speech and Speech Recognition using React-native

As technology advances, new problems come, so, new solutions are developed. Text and Voice solutions were created to help people with some type of disability, also to control devices using voice, to make translators, speech synthesizers, virtual assistants and many other use cases.

 

What are the types of voice technologies?

  • Speech recognition
  • Text-to-speech (speech synthesis)

These technologies use Natural Language Processing (NLP), a sub-field of artificial intelligence that consists of processing and manipulation of human language at different levels.

 

Speech recognition

Speech recognition is a sub-field of computer science that develops technologies that enable the recognition and translation of human language into text.

Speech recognition examples

  • Siri: the virtual assistant from IOS;
  • "Ok Google": when you search for something at Google just by using your voice;
  • Automatically generated closed-caption subtitles for people with hearing impairments;
  • Hands-free computing: control devices, apps and websites without using your hands.

 

Text-to-speech

Text-to-speech is the reverse of speech recognition, also known as speech synthesis, it is the artificial production of human speech. This means that, from a given text input, the computer should output an audio speaking the content of the input.

Text-to-speech examples

  • Multimodal interfaces: Stephen Hawking used text-to-speech for many years of his life to communicate.
  • Apps accessibility: Describe screen elements through text-to-speech for visually impaired people.
  • Translators

 

Implementing Text-to-Speech in React-native

So, let's begin! For this implementation I will be using Expo, so first of all check if it is installed in your machine by using the following command:

expo --version
Enter fullscreen mode Exit fullscreen mode

If it returns an error, you can install following this guide.

After that, let's create a React-native project, I will be using a Typescript template:

expo init mobile --template expo-template-blank-typescript
Enter fullscreen mode Exit fullscreen mode

Setting up React-navigation

First let's install @react-navigation library and its dependencies:

yarn add @react-navigation/native
Enter fullscreen mode Exit fullscreen mode

react-native-screens and react-native-safe-area-contextare required for @react-navigation to work:

expo install react-native-screens react-native-safe-area-context
Enter fullscreen mode Exit fullscreen mode

Now we just need to install the bottom-tabs navigator:

yarn add @react-navigation/bottom-tabs
Enter fullscreen mode Exit fullscreen mode

To maintain the project organized, let's create an src folder, with screens and componentssubfolders. The structure should be like this:

src
|-----screens
|-----components
App.tsx
Enter fullscreen mode Exit fullscreen mode

Our navigator will be inside the components folder, so create the file TabNav.tsx.

import { createBottomTabNavigator } from '@react-navigation/bottom-tabs';
import { Text } from 'react-native';

const Tab = createBottomTabNavigator();

export default function TabNav() {
  return (
    <Tab.Navigator
      screenOptions={
        {
          headerShown: false,
          tabBarIconStyle: {
            display: 'none',
          }
        }
      }
    >
      <Tab.Screen name="TTS" component={() => <Text>TTS SCREEN</Text>} />
      <Tab.Screen name="ASR" component={() => <Text>ASR SCREEN</Text>} />
    </Tab.Navigator>
  );
}
Enter fullscreen mode Exit fullscreen mode

The app will have only two screens, one for Text-to-Speech and other for Speech-Recognition. For now let's render only a Text component with the screen name.

Text-to-Speech screen

With the react-navigation configuration done, we can start to implement text-to-speech. As we are using expo, we can download Expo's official library for TTS (Text-to-Speech).

yarn add expo-speech
Enter fullscreen mode Exit fullscreen mode

This is a very simple library, to use it let's create a file named text-to-speech.screen.tsx where we will put the code for TTS.

import React, { useState } from "react";
import { Button, TextInput, View, StyleSheet } from "react-native";
import * as Speech from 'expo-speech';

export default function TextToSpeechScreen(): JSX.Element {
  const [iptValue, setIptValue] = useState<string>('');

  function speak (): void {
    const thingToSay = iptValue;
    Speech.speak(thingToSay, {
      language: 'pt-BR',
    });
  };

  return (
    <View style={styles.container}>
      <TextInput style={styles.input} placeholder="Type something..." value={iptValue} onChangeText={(text) => setIptValue(text)} />
      <Button title="Listen" onPress={speak} />
    </View>
  );
}

const styles = StyleSheet.create({
  container: {
    flex: 1,
    justifyContent: 'center',
    backgroundColor: '#F5FCFF',
    padding: 8,
  },
  input: {
    height: 40,
    borderColor: 'gray',
    borderWidth: 1,
    margin: 10,
    padding: 10,
  }
});

Enter fullscreen mode Exit fullscreen mode

It's a very simple screen, just a TextInput and a Button centralized with some basic styles. You type something in the input and then when you hit the button, an action is triggered, so the function speak is called.

Speech Recognition screen

To use speech recognition with React-native, there is a library called @react-native-voice/voice, so let's add it to our project.

yarn add @react-native-voice/voice
Enter fullscreen mode Exit fullscreen mode

This library was initially designed for react-native-cli projects, so to use it with Expo, we need to do some adaptations. So, open your app.json file and add the following content inside the "expo" property:

  "plugins": [
    [
      "@react-native-voice/voice",
      {
        "microphonePermission": "Allow $(PRODUCT_NAME) to access your microphone",
        "speechRecogntionPermission": "Allow $(PRODUCT_NAME) to securely recognize user speech"
      }
    ]
  ]
Enter fullscreen mode Exit fullscreen mode

Now, let's create a file named speech-recognition.screen.tsx, this file will have the code to the screen implementing Speech Recognition. Add the following code to the file.

import React, { useState, useEffect } from "react";
import { Button, StyleSheet, Text, View } from "react-native";
import Voice, {
  SpeechResultsEvent,
  SpeechErrorEvent,
} from "@react-native-voice/voice";

export default function SpeechRecognitionScreen() {
  const [results, setResults] = useState<string[]>([]);
  const [isListening, setIsListening] = useState(false);

  useEffect(() => {
    function onSpeechResults(e: SpeechResultsEvent) {
      setResults(e.value ?? []);
    }
    function onSpeechError(e: SpeechErrorEvent) {
      console.error(e);
    }
    Voice.onSpeechError = onSpeechError;
    Voice.onSpeechResults = onSpeechResults;
    return function cleanup() {
      Voice.destroy().then(Voice.removeAllListeners);
    };
  }, []);

  async function toggleListening() {
    try {
      if (isListening) {
        await Voice.stop();
        setIsListening(false);
      } else {
        setResults([]);
        await Voice.start("en-US");
        setIsListening(true);
      }
    } catch (e) {
      console.error(e);
    }
  }

  return (
    <View style={styles.container}>
      <Text>Press the button and start speaking.</Text>
      <Button
        title={isListening ? "Stop Recognizing" : "Start Recognizing"}
        onPress={toggleListening}
      />
      <Text>Results:</Text>
      {results.map((result, index) => {
        return <Text key={`result-${index}`}>{result}</Text>;
      })}
    </View>
  );
}

const styles = StyleSheet.create({
  container: {
    flex: 1,
    justifyContent: "center",
    alignItems: "center",
    backgroundColor: "#F5FCFF",
  },
});
Enter fullscreen mode Exit fullscreen mode

The code above is very simple, we import what we will use from the @react-native-voice/voice package. After that, a function is created to start listening when the button is hit by the user.

Before we test, let's just update our tab navigation file to render the screens we have just created.

Firstly, add the following imports on the top of the file:

import SpeechRecognitionScreen from '../screens/speech-recognition.screen';
import TextToSpeechScreen from '../screens/text-to-speech.screen';
Enter fullscreen mode Exit fullscreen mode

And lastly, replace the content of the lines 10 and 11, for the following:

<Tab.Screen name="TTS" component={TextToSpeechScreen} />
<Tab.Screen name="ASR" component={SpeechRecognitionScreen} />
Enter fullscreen mode Exit fullscreen mode

 

Conclusion

Our app is done, if you did everything correctly, your app should be something like that:

App screens collage

Also, feel free to check the code at github, and also contribute with new features if you want: repository link.

So, that's it! Text-to-Speech and Speech-Recognition are very useful technologies, that can be combined and integrated to many types of systems, including mobile apps. I know that the app isn't the most beautiful, but this is because this article is more focused on presenting TTS and ASR.

Discussion (3)

Collapse
chalrourke profile image
Charlie • Edited on

What is speech recognition accuracy? I mean can your solution boast to have say 95% recognition accuracy like voiso.com/ ? Not long ago I stumbled upon an article that Voiso had the highest recognition accuracy in the industry.

Collapse
josethz00 profile image
José Thomaz Author

What do you mean by accuracy?

Collapse
diskretter profile image
diskretter

very interesting, I hope the more refined the better