Computer Vision and OCR with Python

#python #computerscience

Making the computer have eyes is no easy task. Yes, you can grab a webcam feed but that doesn't mean the computer can parse what it's looking at.

Recent developments did push the field forward. With Deep Learning technology they can now do basic observations of different objects in many different positions.

So if you don't know anything about deep learning or neural networks, how do you get started in the field of computer vision?

Of course you cannot started with the most complicated concepts and work your way backwards. You have to start at the basics.

At the most basic level, you can do pattern recognition. To reduce complexity, I recommend starting out by learning Python as opposed to C++.

Character recognition (OCR) is a very basic task of Computer Vision.

OCR with Tesseract

We can recognize basic characters (a,b,c) from an image. This is named "Optical Character Recognition". Tesseract is a free OCR engine.

apt-get install tesseract-ocr

In the terminal you can do:

tesseract example.png output.txt
cat output.txt

where example.png is this image:

OCR with Python

You can use Python to interact with Tesseract. Install the modules pillow and pytesseract

pip install Pillow
pip install pytesseract

Then you can run this code which will translate the text on the image to text in the terminal:

#!/usr/bin/python3
from PIL import Image
import pytesseract

def ocr_core(filename):
    text = pytesseract.image_to_string(Image.open(filename))
    return text

print(ocr_core('example.png'))

This is a very basic example of computer vision. There's a lot more you can do using all kinds of techniques. However, I think that any introduction to a field should be as simple as possible.