Hello 👋🏻.
Welcome to my first post here,So in the past couple of years i readed many posts in this website and i feel it's very useful to share informations with other and have differents opinions about many tech subjects.
My name is Alaa ,I am a web developer and a 'Webmaster' graduated from the Faculty of Economics and Management of Nabeul and a 2nd year computer science engineering student specializing in WEB technologies at the Private School of Engineering and Technologies (Esprit).
What is OCR ? Well ,it's an algorithm that we use to extract characters from a photo where we teach the algorithm to know the shape of a character in pixels prospective.
We gonna use tesseract.js (OCR) package to extract the words from an image and a file contain the data (characters shape) to use it for the character recognition.
To run the tesseract.js properly you should run the .html file that we gonna make on a server not on local.
- Create a HTML file with the name index.html
<!-- the tesseract javascript file -->
<script src = "js/tesseract.min.js" ></script>
<script>
console.log("Processing");
Tesseract.recognize(
"OCR.png",
"eng",{
workerPath: "js/worker.min.js",
langPath: "langs-folder/",
corePath: "js/tesseract-core.wasm.js",
}).then(function(result){
console.log(result.data.text);
// alert(result.data.text);
}).finally(function(){
});
</script>
2.Create a directory in your root named js and put the js files :
Download the files : https://github.com/geekalaa/OCRJS/tree/main/js
3.Create a directory named 'langs-folder' and download the data files : https://github.com/geekalaa/OCRJS/tree/main/langs-folder
The global lang directory : https://github.com/tesseract-ocr/langdata
4.We gonna use an image for the test : https://github.com/geekalaa/OCRJS/blob/main/OCR.png
Execution :
I used the same script with more advanced features in my online tool try it : Character Count
Top comments (3)
Why Developer Use This?
It does not support many languages
Because i think it's the easiest way to extract text from image without using so much ram and processing power .
good point there ,i just added the link for the global lang data : github.com/tesseract-ocr/langdata