DEV Community

Rahul Sharma
Rahul Sharma

Posted on • Updated on

How to Get Started with Javascript Audiobook

The Complete Guide to covert Image To Text and text to speech with Javascript | Rahul Sharma(DevsMitra)

  • Are you looking for a way to convert images to text?
  • Just take a picture of a text and it will be converted to text for you?
  • Same text can be read by a javascript application?

The Complete Guide to covert Image To Text and text to speech with Javascript | Rahul Sharma(DevsMitra)

Today, I am going to fulfill your long-awaited wish, by taking a picture of a text and converting it to text. In addition, I will also convert the text to speech for you.

I'm going to create a simple application that will read convert image URL to text and convert text to speech.

Before we begin, I want to explain a few things.

OCR (Optical Character Recognition)

It is a technology that recognizes the text in an image. It's commonly used in multiple applications like document scanning, handwriting recognition etc.

Javascript does not have a built-in OCR library. we can use the tesseract.js to do the OCR for us. You check out the tesseract.js library for more information.


SpeechSynthesis is a technology that can convert text to speech.

The SpeechSynthesis interface of the Web Speech API is the controller interface for the speech service; this can be used to retrieve information about the synthesis voices available on the device, start and pause speech, and other commands besides. Referred from MDN

I'm very excited to show you how to use tesseract.js to convert an image to text. I will show you how to do this in the following steps.

Part 1: Convert an image to text

I'll add 2 examples of images to convert to text. First from the image URL and second from the image file.

  • Step 1: Create a simple HTML page with the following code.


    Progress: <span id="progress">0</span>
    <div class="container">
      <button onclick="onCovert()">Convert URL Image</button>
    <div class="container">
      <img id="output" src="" width="100" height="100" />
    <div class="container">
      <p id="text"></p>
      <button onclick="read()">Read</button>
    <script src="script.js"></script>
Enter fullscreen mode Exit fullscreen mode
  • Step 2: Add Tesseract.js to the HTML page, The easiest way to include Tesseract.js in your HTML5 page is to use a CDN. So, add the following to the <head> of your webpage.
<script src=""></script>
Enter fullscreen mode Exit fullscreen mode
  • Step 3: Initialize And Run Tesseract OCR


const textEle = document.getElementById('text');
const imgEle = document.getElementById('output');
const progressEle = document.getElementById('progress');

const logger = ({ progress }) =>
  (progressEle.innerHTML = `${(progress * 100).toFixed(2)}%`);

// Create Image to text using main
const startConversion = async (url) => {
  try {
    const result = await Tesseract.recognize(url, 'eng', { logger });
    const {
      data: { text },
    } = result;
    return text;
  } catch (e) {

const onCovert = async () => {
  const urlEle = document.getElementById('url');
  const text = await startConversion(urlEle.value);
  textEle.innerHTML = text;

// Create Image to text using worker better way
const worker = Tesseract.createWorker({
const imageToText = async (url) => {
  try {
    await worker.load();
    await worker.loadLanguage('eng');
    await worker.initialize('eng');
    const {
      data: { text },
    } = await worker.recognize(url);
    await worker.terminate();
    textEle.innerHTML = text;
  } catch (error) {}

const onImageChange = (file) => {
  if (file) {
    let reader = new FileReader();
    reader.onload = function () {
      let url = reader.result;
      imgEle.src = url;

Enter fullscreen mode Exit fullscreen mode

Tesreact.js API response

blocks: [{}]
box: null
confidence: 90
hocr: "<div class='ocr_page' id='page_1' title='image \"\"; bbox 0 0 1486 668; ppageno 0'>\n <div class='ocr_carea' id='block_1_1' title=\"bbox 28 34 1454 640\">\n  <p class='ocr_par' id='par_1_1' lang='eng' title=\"bbox 28 34 1454 640\">\n"
lines: (8) [{}, {}, {}, {}, {}, {}, {}, {}]
oem: "DEFAULT"
osd: null
paragraphs: [{}]
symbols: (295) [{}, {}, {}, {}, {}, {}, ]
text: "Mild Splendour of the various-vested Night!\nMother of wildly-working visions! haill\nI watch thy gliding, while with watery light\nThy weak eye glimmers through a fleecy veil;\nAnd when thou lovest thy pale orb to shroud\nBehind the gather’d blackness lost on high;\nAnd when thou dartest from the wind-rent cloud\nThy placid lightning o’er the awaken’d sky.\n"
tsv: "4\t1\t1\t1\t7\t0\t28\t487\t1400\t61\t-1\t\n5\t1\t1\t1\t7\t1\t28\t487\t116\t50\t87\tAnd\n5\t1\t1\t1\t7\t2\t170\t488\t150\t51\t87\twhen\n5\t1\t1\t1\t7\t3\t345\t490\t123\t51\t92\tthou\n5\t1\t1\t1\t7\t4\t497\t492\t188\t51\t91\tdartest\n5\t1\t1\t1\t7\t5\t711\t493\t128\t51\t91\tfrom\n5\t1\t1\t1\t7\t6\t866\t494\t87\t52\t92\tthe\n5\t1\t1\t1\t7\t7\t978\t495\t272\t52\t92\twind-rent\n5\t1\t1\t1\t7\t8\t1275\t494\t153\t54\t92\tcloud\n4\t1\t1\t1\t8\t0\t96\t563\t1228\t77\t-1\t\n5\t1\t1\t1\t8\t1\t96\t563\t112\t69\t92\tThy\n5\t1\t1\t1\t8\t2\t231\t564\t172\t70\t91\tplacid\n5\t1\t1\t1\t8\t3\t427\t566\t248\t73\t92\tlightning\n5\t1\t1\t1\t8\t4\t700\t568\t100\t53\t89\to’er\n5\t1\t1\t1\t8\t5\t824\t569\t87\t69\t92\tthe\n5\t1\t1\t1\t8\t6\t935\t569\t260\t54\t82\tawaken’d\n5\t1\t1\t1\t8\t7\t1218\t569\t106\t71\t92\tsky.\n"
unlv: null
version: "4.1.1-56-gbe45"
words: (58) [{}, {}, {}]
[[Prototype]]: Object
Enter fullscreen mode Exit fullscreen mode

Let's understand the structure of the data.

  • text: All of the recognized text as a string.
  • lines: An array of every recognized line by line of text.
  • words: An array of every recognized word.
  • symbols: An array of each of the characters recognized.
  • paragraphs: An array of every recognized paragraph.

We have text in the form of a string, We can use this for reading.

Part 2: Convert text to speech

For text to speech, we will use the inbuilt text to speech API.

speak: This method will add a speech to a queue called utterance queue. This speech will be spoken after all speeches in the queue before it have been spoken. this function takes a SpeechSynthesisUtterance object as an argument. This object has a property called text, which is the text that we want to convert to speech. We can use this to convert text to speech.

NOTE: SpeechSynthesisUtterance take different properties to create a speech. check the SpeechSynthesisUtterance for more information.

const read = () => {
  const msg = new SpeechSynthesisUtterance();
  msg.text = textEle.innerText;
Enter fullscreen mode Exit fullscreen mode

cancel: Removes all utterances from the utterance queue.

getVoices: Returns a list of SpeechSynthesisVoice objects representing all the available voices on the current device.

pause: Puts the SpeechSynthesis object into a paused state.

resume: Puts the SpeechSynthesis object into a non-paused state: resumes it if it was already paused.

Live Demo

Browser Compatibility

SpeechSynthesis API is available in all modern browsers — Firefox, Chrome, Edge & Safari.

Got any questions or additional? please leave a comment.

Thank you for reading 😊

More content at
Follow me on Github, Twitter, LinkedIn, Medium, and Stackblitz.

Discussion (0)