DEV Community

pings10
pings10

Posted on

How do I get tesseract python to read these numbers?

I have cropped just the area where the price tag is. I did this on the input side, even with that cropped it still has trouble reading the numbers. This is the code I've tried. In the last photo, I"m trying to read the part that says ISBN: numbers and letters. I wont always be able to get the price tag perfectly cropped, so I know I may need a way to isolate the price tag, it is mostly white. Even with this cropped though, I'm having hard time getting tesseract to read the price in the right square.

from PIL import Image, ImageOps
import pytesseract as tt
import numpy as np
import cv2

#tt.tesseract_cmd = '/usr/bin/tesseract'

filename = 'tbar.jpg'
#img1 = np.array(Image.open(filename))

img1 = cv2.imread(filename, cv2.IMREAD_GRAYSCALE)


#img1 = cv2.GaussianBlur(img1,(2,2), 1)

#img = cv2.GaussianBlur(img, (5, 5), 0)

text = tt.image_to_string(img1, lang='eng',  \
           config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')



#text = tt.image_to_string(img1)


print(text)
Enter fullscreen mode Exit fullscreen mode

Image description
Image description
Image description

Top comments (0)