Making the computer have eyes is no easy task. Yes, you can grab a webcam feed but that doesn't mean the computer can parse what it's looking at.
Recent developments did push the field forward. With Deep Learning technology they can now do basic observations of different objects in many different positions.
So if you don't know anything about deep learning or neural networks, how do you get started in the field of computer vision?
Of course you cannot started with the most complicated concepts and work your way backwards. You have to start at the basics.
At the most basic level, you can do pattern recognition. To reduce complexity, I recommend starting out by learning Python as opposed to C++.
Character recognition (OCR) is a very basic task of Computer Vision.
OCR with Tesseract
We can recognize basic characters (a,b,c) from an image. This is named "Optical Character Recognition". Tesseract is a free OCR engine.
apt-get install tesseract-ocr
In the terminal you can do:
tesseract example.png output.txt
cat output.txt
where example.png is this image:
OCR with Python
You can use Python to interact with Tesseract. Install the modules pillow and pytesseract
pip install Pillow
pip install pytesseract
Then you can run this code which will translate the text on the image to text in the terminal:
#!/usr/bin/python3
from PIL import Image
import pytesseract
def ocr_core(filename):
text = pytesseract.image_to_string(Image.open(filename))
return text
print(ocr_core('example.png'))
This is a very basic example of computer vision. There's a lot more you can do using all kinds of techniques. However, I think that any introduction to a field should be as simple as possible.
Top comments (1)
Cool! We use tesseract where I work to read data off of some invoices, and it works pretty well.