DEV Community

Cover image for Computer Vision and OCR with Python
codesharedot
codesharedot

Posted on

Computer Vision and OCR with Python

Making the computer have eyes is no easy task. Yes, you can grab a webcam feed but that doesn't mean the computer can parse what it's looking at.

Recent developments did push the field forward. With Deep Learning technology they can now do basic observations of different objects in many different positions.

So if you don't know anything about deep learning or neural networks, how do you get started in the field of computer vision?

Of course you cannot started with the most complicated concepts and work your way backwards. You have to start at the basics.

At the most basic level, you can do pattern recognition. To reduce complexity, I recommend starting out by learning Python as opposed to C++.

Character recognition (OCR) is a very basic task of Computer Vision.

OCR with Tesseract

We can recognize basic characters (a,b,c) from an image. This is named "Optical Character Recognition". Tesseract is a free OCR engine.

apt-get install tesseract-ocr
Enter fullscreen mode Exit fullscreen mode

In the terminal you can do:

tesseract example.png output.txt
cat output.txt
Enter fullscreen mode Exit fullscreen mode

where example.png is this image:

font image

OCR with Python

You can use Python to interact with Tesseract. Install the modules pillow and pytesseract

pip install Pillow
pip install pytesseract
Enter fullscreen mode Exit fullscreen mode

Then you can run this code which will translate the text on the image to text in the terminal:

#!/usr/bin/python3
from PIL import Image
import pytesseract

def ocr_core(filename):
    text = pytesseract.image_to_string(Image.open(filename))
    return text

print(ocr_core('example.png'))
Enter fullscreen mode Exit fullscreen mode

This is a very basic example of computer vision. There's a lot more you can do using all kinds of techniques. However, I think that any introduction to a field should be as simple as possible.

Related links:

Top comments (1)

Collapse
 
zchtodd profile image
zchtodd

Cool! We use tesseract where I work to read data off of some invoices, and it works pretty well.