Rahul Sharma

Posted on Mar 17, 2022 • Edited on Nov 23, 2023

How to Get Started with Javascript Audiobook

#javascript #ocr #react #angular

Are you looking for a way to convert images to text?
Just take a picture of a text and it will be converted to text for you?
Same text can be read by a javascript application?

Today, I am going to fulfill your long-awaited wish, by taking a picture of a text and converting it to text. In addition, I will also convert the text to speech for you.

I'm going to create a simple application that will read convert image URL to text and convert text to speech.

Before we begin, I want to explain a few things.

OCR (Optical Character Recognition)

It is a technology that recognizes the text in an image. It's commonly used in multiple applications like document scanning, handwriting recognition etc.

Javascript does not have a built-in OCR library. we can use the tesseract.js to do the OCR for us. You check out the tesseract.js library for more information.

SpeechSynthesis

SpeechSynthesis is a technology that can convert text to speech.

The SpeechSynthesis interface of the Web Speech API is the controller interface for the speech service; this can be used to retrieve information about the synthesis voices available on the device, start and pause speech, and other commands besides. Referred from MDN

I'm very excited to show you how to use tesseract.js to convert an image to text. I will show you how to do this in the following steps.

Part 1: Convert an image to text

I'll add 2 examples of images to convert to text. First from the image URL and second from the image file.

Step 1: Create a simple HTML page with the following code.

index.html

<html>
  <body>
    Progress: <span id="progress">0</span>
    <div class="container">
      <input
        id="url"
        value="https://tesseract.projectnaptha.com/img/eng_bw.png"
      />
      <button onclick="onCovert()">Convert URL Image</button>
    </div>
    <div class="container">
      <img id="output" src="" width="100" height="100" />
      <input
        name="photo"
        type="file"
        accept="image/*"
        onchange="onImageChange(this.files[0])"
      />
    </div>
    <div class="container">
      <p id="text"></p>
      <button onclick="read()">Read</button>
    </div>
    <script src="script.js"></script>
  </body>
</html>

Step 2: Add Tesseract.js to the HTML page, The easiest way to include Tesseract.js in your HTML5 page is to use a CDN. So, add the following to the <head> of your webpage.

<script src="https://unpkg.com/tesseract.js@v2.1.0/dist/tesseract.min.js"></script>

Step 3: Initialize And Run Tesseract OCR

script.js

const textEle = document.getElementById('text');
const imgEle = document.getElementById('output');
const progressEle = document.getElementById('progress');

const logger = ({ progress }) =>
  (progressEle.innerHTML = `${(progress * 100).toFixed(2)}%`);

// Create Image to text using main
const startConversion = async (url) => {
  try {
    const result = await Tesseract.recognize(url, 'eng', { logger });
    const {
      data: { text },
    } = result;
    return text;
  } catch (e) {
    console.error(e);
  }
};

const onCovert = async () => {
  const urlEle = document.getElementById('url');
  const text = await startConversion(urlEle.value);
  textEle.innerHTML = text;
};

// Create Image to text using worker better way
const worker = Tesseract.createWorker({
  logger,
});
const imageToText = async (url) => {
  try {
    await worker.load();
    await worker.loadLanguage('eng');
    await worker.initialize('eng');
    const {
      data: { text },
    } = await worker.recognize(url);
    await worker.terminate();
    textEle.innerHTML = text;
  } catch (error) {}
};

const onImageChange = (file) => {
  if (file) {
    let reader = new FileReader();
    reader.readAsDataURL(file);
    reader.onload = function () {
      let url = reader.result;
      imgEle.src = url;
      imageToText(url);
    };
  }
};

Tesreact.js API response

blocks: [{…}]
box: null
confidence: 90
hocr: "<div class='ocr_page' id='page_1' title='image \"\"; bbox 0 0 1486 668; ppageno 0'>\n <div class='ocr_carea' id='block_1_1' title=\"bbox 28 34 1454 640\">\n  <p class='ocr_par' id='par_1_1' lang='eng' title=\"bbox 28 34 1454 640\">\n"
lines: (8) [{…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}]
oem: "DEFAULT"
osd: null
paragraphs: [{…}]
psm: "SINGLE_BLOCK"
symbols: (295) [{…}, {…}, {…}, {…}, {…}, {…}, …]
text: "Mild Splendour of the various-vested Night!\nMother of wildly-working visions! haill\nI watch thy gliding, while with watery light\nThy weak eye glimmers through a fleecy veil;\nAnd when thou lovest thy pale orb to shroud\nBehind the gather’d blackness lost on high;\nAnd when thou dartest from the wind-rent cloud\nThy placid lightning o’er the awaken’d sky.\n"
tsv: "4\t1\t1\t1\t7\t0\t28\t487\t1400\t61\t-1\t\n5\t1\t1\t1\t7\t1\t28\t487\t116\t50\t87\tAnd\n5\t1\t1\t1\t7\t2\t170\t488\t150\t51\t87\twhen\n5\t1\t1\t1\t7\t3\t345\t490\t123\t51\t92\tthou\n5\t1\t1\t1\t7\t4\t497\t492\t188\t51\t91\tdartest\n5\t1\t1\t1\t7\t5\t711\t493\t128\t51\t91\tfrom\n5\t1\t1\t1\t7\t6\t866\t494\t87\t52\t92\tthe\n5\t1\t1\t1\t7\t7\t978\t495\t272\t52\t92\twind-rent\n5\t1\t1\t1\t7\t8\t1275\t494\t153\t54\t92\tcloud\n4\t1\t1\t1\t8\t0\t96\t563\t1228\t77\t-1\t\n5\t1\t1\t1\t8\t1\t96\t563\t112\t69\t92\tThy\n5\t1\t1\t1\t8\t2\t231\t564\t172\t70\t91\tplacid\n5\t1\t1\t1\t8\t3\t427\t566\t248\t73\t92\tlightning\n5\t1\t1\t1\t8\t4\t700\t568\t100\t53\t89\to’er\n5\t1\t1\t1\t8\t5\t824\t569\t87\t69\t92\tthe\n5\t1\t1\t1\t8\t6\t935\t569\t260\t54\t82\tawaken’d\n5\t1\t1\t1\t8\t7\t1218\t569\t106\t71\t92\tsky.\n"
unlv: null
version: "4.1.1-56-gbe45"
words: (58) [{…}, {…}, {…}]
[[Prototype]]: Object

Let's understand the structure of the data.

text: All of the recognized text as a string.
lines: An array of every recognized line by line of text.
words: An array of every recognized word.
symbols: An array of each of the characters recognized.
paragraphs: An array of every recognized paragraph.

We have text in the form of a string, We can use this for reading.

Part 2: Convert text to speech

For text to speech, we will use the inbuilt text to speech API.

speak: This method will add a speech to a queue called utterance queue. This speech will be spoken after all speeches in the queue before it have been spoken. this function takes a SpeechSynthesisUtterance object as an argument. This object has a property called text, which is the text that we want to convert to speech. We can use this to convert text to speech.

NOTE: SpeechSynthesisUtterance take different properties to create a speech. check the SpeechSynthesisUtterance for more information.

const read = () => {
  const msg = new SpeechSynthesisUtterance();
  msg.text = textEle.innerText;
  window.speechSynthesis.speak(msg);
};

cancel: Removes all utterances from the utterance queue.

getVoices: Returns a list of SpeechSynthesisVoice objects representing all the available voices on the current device.

pause: Puts the SpeechSynthesis object into a paused state.

resume: Puts the SpeechSynthesis object into a non-paused state: resumes it if it was already paused.

Live Demo

Browser Compatibility

SpeechSynthesis API is available in all modern browsers — Firefox, Chrome, Edge & Safari.

Got any questions or additional? please leave a comment.

Thank you for reading 😊

Must Read If you haven't

React redux best practice to reduce code

Rahul Sharma ・ May 3 '22

#react #redux #javascript #typescript

How to cancel Javascript API request with AbortController

Rahul Sharma ・ Apr 9 '22

#javascript #webdev #frontend #html

13 Typescript Utility: A Cheat Sheet for Developer

Rahul Sharma ・ Apr 2 '22

#typescript #javascript #webdev #frontend

How to solve REST API routing problem with decorators?

Rahul Sharma ・ Mar 23 '22

#node #typescript #javascript #express

Catch me on

Rahul Sharma

Passionate JavaScript Developer 🚀 | Crafting Web Magic ✨ | Code Enthusiast 💻

Youtube Github LinkedIn Medium Stackblitz Hashnode HackerNoon

DEV Community

How to Get Started with Javascript Audiobook

OCR (Optical Character Recognition)

SpeechSynthesis

Part 1: Convert an image to text

Tesreact.js API response

Let's understand the structure of the data.

Part 2: Convert text to speech

Browser Compatibility

Must Read If you haven't

React redux best practice to reduce code

Rahul Sharma ・ May 3 '22

How to cancel Javascript API request with AbortController

Rahul Sharma ・ Apr 9 '22

13 Typescript Utility: A Cheat Sheet for Developer

Rahul Sharma ・ Apr 2 '22

How to solve REST API routing problem with decorators?

Rahul Sharma ・ Mar 23 '22

Catch me on

Rahul Sharma

Top comments (0)

Read next

How the new concepts of JSSugar and JS0 are able to slow down websites

This Week In React #214 : Base UI, Custom Elements, Next.js, React Router, Android XR, iOS targets, Nitro, RNTL...

To access terminal profile of VS Code and add commands to be run initially

Node.js as a normal script.js file

OCR (Optical Character Recognition)

SpeechSynthesis

Part 1: Convert an image to text

Tesreact.js API response

Let's understand the structure of the data.

Part 2: Convert text to speech

Browser Compatibility

Must Read If you haven't

React redux best practice to reduce code

Rahul Sharma ・ May 3 '22

How to cancel Javascript API request with AbortController

Rahul Sharma ・ Apr 9 '22

13 Typescript Utility: A Cheat Sheet for Developer

Rahul Sharma ・ Apr 2 '22

How to solve REST API routing problem with decorators?

Rahul Sharma ・ Mar 23 '22

Catch me on

Rahul SharmaFollow

Read next

How the new concepts of JSSugar and JS0 are able to slow down websites

This Week In React #214 : Base UI, Custom Elements, Next.js, React Router, Android XR, iOS targets, Nitro, RNTL...

To access terminal profile of VS Code and add commands to be run initially

Node.js as a normal script.js file

Rahul Sharma