DEV Community

Arman Tarkhanian
Arman Tarkhanian

Posted on

2023-07-28: OCR

Again, nothing much on the game dev front. I did set up Parsec for my colleague, though, so he can access the computer whenever he wants and develop from there. I also bought some assets for the game to use.

Let's dive right into what I've been up to recently for work.

My latest venture has been exploring the world of Optical Character Recognition (OCR), specifically open-source libraries. In the short term, my boss just wants something to keep his notes more organized more easily, preferably with layout analysis and even font styles (bold/italics). In the long term, we need it to read in documents and handwriting from forms, be it legal or medical documents or whatever.

I've been comparing four of them: Tesseract, OCRopus, EasyOCR, and Keras-OCR. Each of these has its own strengths and weaknesses, which I'll discuss in the order I explored them.

First up was Tesseract. This OCR tool, backed by Google, is robust and accurate, supporting over 100 languages. However, it has its quirks. Also, its API is a bit lower-level compared to other tools, which can make complex tasks a bit of a challenge. People say that it's not that great at text that's not heavily pre-processed, but I didn't get a chance to use it actually yet. I don't like that it requires a separate engine installation. Also, Google is a terrible company. Despite these limitations, Tesseract's wide language support and high accuracy make it a strong contender in the OCR space.

Next, I looked into OCRopus. Developed by the German Research Center for Artificial Intelligence, OCRopus uses neural networks for recognition and is particularly good at dealing with structured documents like forms or spreadsheets. However, like Tesseract, OCRopus doesn't have explicit support for recognizing font styles and has a more complex setup process compared to other tools. I didn't even want to touch this one, to be honest.

Then, I looked at EasyOCR. This Python library uses deep learning for OCR, making it quite accurate according to its creators and some articles I read. It's straightforward to use, comes with pre-trained models, and supports more than 80 languages. This is probably what we're going to go with since it's the easiest to set up and use out of the box.

Finally, I checked out Keras-OCR. This high-level OCR tool is built on top of Keras and TensorFlow. It provides a simple, flexible API and includes a pre-trained model. It's particularly good for tasks that involve recognizing text in natural images. My boss was actually familiar with this one from before; it might have been one of the first ones out there. However, it's probably not worth pursuing at the moment.

In addition to these, I also came across PaddleOCR through a bunch of Reddit posts (almost seemed like a marketing campaign). It's fairly new but has a sizable amount of stars on GitHub, indicating its popularity. It also has layout analysis and handwriting recognition which are definitely things we need. However, it's Chinese software created by Baidu, so there are potential security concerns.

After researching these tools, I found that while all provide robust and accurate OCR capabilities, none of them have robust support for layout analysis. This is a challenging problem in the field of OCR and might require a more complex solution, possibly involving additional image processing or machine learning.

There are also commercial options like Google's Vision API, Microsoft Azure's Computer Vision, and AWS Textract, which offer OCR services at a cost per call.

HuggingFace is a popular platform that my boss recommended me to look at for actual downloadable models that we can tweak and train ourselves, but I couldn't find anything too great.

Otherwise, we also edited the Typeform a little bit to make it more aligned with the overall theme that our CEO wants, which I guess is a sort of royal purple. Worked out some of the logic and question wording too.

That's pretty much it. Cheers!

Top comments (0)