How a Programmer Developed a Text Reader App for His 80-Year-Old Grandpa

#machinelearning #tutorial

"John, have you seen my glasses?"

Our old friend John, a programmer at Huawei, has a grandpa who despite his old age, is an avid reader. Leaning back, struggling to make out what was written on the newspaper through his glasses, but unable to take his eyes off the text — this was how my grandpa used to read, John explained.

Reading this way was harmful on his grandpa's vision, and it occurred to John that the ears could take over the role of "reading" from the eyes. He soon developed a text-reading app that followed this logic, recognizing and then reading out text from a picture. Thanks to this app, John's grandpa now can ”read” from the comfort of his rocking chair, without having to strain his eyes.

How to Implement

The user takes a picture of a text passage. The app then automatically identifies the location of the text within the picture, and adjusts the shooting angle to an angle directly facing the text.
The app recognizes and extracts the text from the picture.
The app converts the recognized text into audio output by leveraging text-to-speech technology. These functions are easy to implement, when relying on three services in HUAWEI ML Kit: document skew correction, text recognition, and text to speech (TTS).

Preparations

buildscript {
    repositories {
        google()
        jcenter()
        maven {url 'https://developer.huawei.com/repo/'}
    }
    dependencies {
        classpath "com.android.tools.build:gradle:4.1.1"
        classpath 'com.huawei.agconnect:agcp:1.4.2.300'
        // NOTE: Do not place your app dependencies here; they belong
        // in the individual module build.gradle files.
    }
}
allprojects {
    repositories {
        google()
        jcenter()
        maven {url 'https://developer.huawei.com/repo/'}
    }
}

Add the build dependencies for the HMS Core SDK.

dependencies {

    // Import the base SDK.
    implementation 'com.huawei.hms:ml-computer-voice-tts:2.1.0.300'
    // Import the bee voice package.
    implementation 'com.huawei.hms:ml-computer-voice-tts-model-bee:2.1.0.300'
    // Import the eagle voice package.
    implementation 'com.huawei.hms:ml-computer-voice-tts-model-eagle:2.1.0.300'
    // Import a PDF file analyzer.
    implementation 'com.itextpdf:itextg:5.5.10'
}

Tap PREVIOUS or NEXT to turn to the previous or next page. Tap speak to start reading; tap it again to pause reading.

Development process

Create a TTS engine by using the custom configuration class MLTtsConfig. Here, on-device TTS is used as an example.

private void initTts() {
    // Set authentication information for your app to download the model package from the server of Huawei.
    MLApplication.getInstance().setApiKey(AGConnectServicesConfig.
            fromContext(getApplicationContext()).getString("client/api_key"));
    // Create a TTS engine by using MLTtsConfig.
    mlTtsConfigs = new MLTtsConfig()
            // Set the text converted from speech to English.
            .setLanguage(MLTtsConstants.TTS_EN_US)
            // Set the speaker with the English male voice (eagle).
            .setPerson(MLTtsConstants.TTS_SPEAKER_OFFLINE_EN_US_MALE_EAGLE)
            // Set the speech speed whose range is (0, 5.0]. 1.0 indicates a normal speed.
            .setSpeed(.8f)
            // Set the volume whose range is (0, 2). 1.0 indicates a normal volume.
            .setVolume(1.0f)
            // Set the TTS mode to on-device.
            .setSynthesizeMode(MLTtsConstants.TTS_OFFLINE_MODE);
    mlTtsEngine = new MLTtsEngine(mlTtsConfigs);
    // Update the configuration when the engine is running.
    mlTtsEngine.updateConfig(mlTtsConfigs);
    // Pass the TTS callback function to the TTS engine to perform TTS.
    mlTtsEngine.setTtsCallback(callback);
    // Create an on-device TTS model manager.
    manager = MLLocalModelManager.getInstance();
    isPlay = false;
}

Create a TTS callback function for processing the TTS result.

MLTtsCallback callback = new MLTtsCallback() {
    @Override   
    public void onError(String taskId, MLTtsError err) {
        // Processing logic for TTS failure.
    }
    @Override
    public void onWarn(String taskId, MLTtsWarn warn) {
        // Alarm handling without affecting service logic.
    }
    @Override
    // Return the mapping between the currently played segment and text. start: start position of the audio segment in the input text; end (excluded): end position of the audio segment in the input text.
    public void onRangeStart(String taskId, int start, int end) {
        // Process the mapping between the currently played segment and text.
    }
    @Override
    // taskId: ID of a TTS task corresponding to the audio.
    // audioFragment: audio data.
    // offset: offset of the audio segment to be transmitted in the queue. One TTS task corresponds to a TTS queue.
    // range: text area where the audio segment to be transmitted is located; range.first (included): start position; range.second (excluded): end position.
    public void onAudioAvailable(String taskId, MLTtsAudioFragment audioFragment, int offset,
                                 Pair<Integer, Integer> range, Bundle bundle) {
        // Audio stream callback API, which is used to return the synthesized audio data to the app.
    }
    @Override
    public void onEvent(String taskId, int eventId, Bundle bundle) {
        // Callback method of a TTS event. eventId indicates the event name.
        boolean isInterrupted;
        switch (eventId) {
            case MLTtsConstants.EVENT_PLAY_START:
                // Called when playback starts.
                break;
            case MLTtsConstants.EVENT_PLAY_STOP:
                // Called when playback stops.
                isInterrupted = bundle.getBoolean(MLTtsConstants.EVENT_PLAY_STOP_INTERRUPTED);
                break;
            case MLTtsConstants.EVENT_PLAY_RESUME:
                // Called when playback resumes.
                break;
            case MLTtsConstants.EVENT_PLAY_PAUSE:
                // Called when playback pauses.
                break;
            // Pay attention to the following callback events when you focus on only the synthesized audio data but do not use the internal player for playback.
            case MLTtsConstants.EVENT_SYNTHESIS_START:
                // Called when TTS starts.
                break;
            case MLTtsConstants.EVENT_SYNTHESIS_END:
                // Called when TTS ends.
                break;
            case MLTtsConstants.EVENT_SYNTHESIS_COMPLETE:
                // TTS is complete. All synthesized audio streams are passed to the app.
                isInterrupted = bundle.getBoolean(MLTtsConstants.EVENT_SYNTHESIS_INTERRUPTED);
                break;
            default:
                break;
        }
    }
};

Extract text from a PDF file.

private String loadText(String path) {
    String result = "";
    try {
        PdfReader reader = new PdfReader(path);
        result = result.concat(PdfTextExtractor.getTextFromPage(reader,
                mCurrentPage.getIndex() + 1).trim() + System.lineSeparator());
        reader.close();
    } catch (IOException e) {
        showToast(e.getMessage());
    }
    // Obtain the position of the header.
    int header = result.indexOf(System.lineSeparator());
    // Obtain the position of the footer.
    int footer = result.lastIndexOf(System.lineSeparator());
    if (footer != 0){
        // Do not display the text in the header and footer.
        return result.substring(header, footer - 5);
    }else {
        return result;
    }
}

Perform TTS in on-device mode.

// Create an MLTtsLocalModel instance to set the speaker so that the language model corresponding to the speaker can be downloaded through the model manager.
MLTtsLocalModel model = new MLTtsLocalModel.Factory(MLTtsConstants.TTS_SPEAKER_OFFLINE_EN_US_MALE_EAGLE).create();
manager.isModelExist(model).addOnSuccessListener(new OnSuccessListener<Boolean>() {
    @Override
    public void onSuccess(Boolean aBoolean) {
        // If the model is not downloaded, call the download API. Otherwise, call the TTS API of the on-device engine.
        if (aBoolean) {
            String source = loadText(mPdfPath);
            // Call the speak API to perform TTS. source indicates the text to be synthesized.
            mlTtsEngine.speak(source, MLTtsEngine.QUEUE_APPEND);
            if (isPlay){
                // Pause playback.
                mlTtsEngine.pause();
                tv_speak.setText("speak");
            }else {
                // Resume playback.
                mlTtsEngine.resume();
                tv_speak.setText("pause");
            }
            isPlay = !isPlay;
        } else {
            // Call the API for downloading the on-device TTS model.
            downloadModel(MLTtsConstants.TTS_SPEAKER_OFFLINE_EN_US_MALE_EAGLE);
            showToast("The offline model has not been downloaded!");
        }
    }
}).addOnFailureListener(new OnFailureListener() {
    @Override
    public void onFailure(Exception e) {
        showToast(e.getMessage());
    }
});

Release resources when the current UI is destroyed.

@Override
protected void onDestroy() {
    super.onDestroy();
    try {
        if (mParcelFileDescriptor != null) {
            mParcelFileDescriptor.close();
        }
        if (mCurrentPage != null) {
            mCurrentPage.close();
        }
        if (mPdfRenderer != null) {
            mPdfRenderer.close();
        }
        if (mlTtsEngine != null){
            mlTtsEngine.shutdown();
        }
    } catch (IOException e) {
        e.printStackTrace();
    }
}

Other Applicable Scenarios

TTS can be used across a broad range of scenarios. For example, you could integrate it into an education app to read bedtime stories to children, or integrate it into a navigation app, which could read out instructions aloud.

To learn more, visit the following links:
Documentation on the HUAWEI Developers website
https://developer.huawei.com/consumer/en/hms/huawei-MapKit

HUAWEI Developers official website

Development Guide

Redditto join developer discussions

GitHub or Gitee to download the demo and sample code

Stack Overflow to solve integration problems

DEV Community

How a Programmer Developed a Text Reader App for His 80-Year-Old Grandpa

Top comments (0)

Read next

HTTP and GraphQL

Understanding Java Multithreading: Part 1

Machine Learning for Software Engineers: A Comprehensive Theoretical Foundation

💻 Mastering Linux Shell Scripting: The Ultimate Guide for Automation Ninjas 🚀