Today's blog entry goes into something that I've been curious about - Optical Character Recognition, also known as OCR. While OCR is a specialized topic, my main focus was on a simple application: converting a screenshot of text, which is essentially an image, back into text.
To achieve this, we'll be leveraging the power of the Tesseract OCR engine. Originally proprietary software developed by Hewlett-Packard, it was later released as open-source and is now maintained by Google.
For those using Windows, like myself, you'll need to install the Tesseract OCR engine using the Tesseract Installer for Windows. The installer can be located on this page. Remember to set the PATH environment variable to your installation location.
Now let's get down to the code. We start by importing two essential libraries:
import pytesseract
from PIL import Image
pytesseract is a Python wrapper for the Tesseract OCR Engine. It is used the magic used to convert different types of documents - including scanned paper documents, PDF files, or even images - into text.
PIL (or Pillow in this context) is a Python library that allows you to open, modify, and save many different image file formats. For this script, we're using the Image module to open an image file.
To handle the image we want to use, we add:
img = Image.open('./images/convert-to-text.png')
That one line of code above opens and figures out the given image file. We just pass it the path to the image.
Next, we set pytesseract to the Tesseract executable (the one we installed) with the following line:
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files/Tesseract-OCR/tesseract.exe'
This line specifies the path to the 'tesseract.exe' file which is the magic that's actually doing the OCR process.
Finally, we extract the text from the image and print the result:
result = pytesseract.image_to_string(img)
print(result)
We pass the image to the image_to_string function which then uses the Tesseract-OCR engine to extract text from the image.
And there you have it! This is perhaps the simplest example I can think of to demonstrate the capabilities of OCR. It merely opens an image, utilizes the pytesseract library to read the text content of the image, and then prints out the extracted text.
Here is the complete code:
import pytesseract
from PIL import Image
img = Image.open('./images/convert-to-text.png')
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files/Tesseract-OCR/tesseract.exe'
result = pytesseract.image_to_string(img)
print(result)
Top comments (0)