DEV Community

Cover image for How to Extract Email & Phone Number from a Business Card Using Python, OpenCV, and TesseractOCR
Mrinal Walia
Mrinal Walia

Posted on

How to Extract Email & Phone Number from a Business Card Using Python, OpenCV, and TesseractOCR

In this blog post, you will learn how to extract email and phone number from a business card and save the output in a JSON file.

Side Note: You can try out with this new course on data visualizations, by Datacamp Web Scraping in Python, Introduction to Matplotlib in Python and Exploratory Data Analysis in Python which helped me a lot in starting my journey into Web Scraping, or you can take up this course on Image Processing and Computer Vision if you have good experience in Python.

Building the email and phone number extractor with OpenCV & TesseractOCR can be done by following five easy steps :

  • Step 1: We will start by detecting the edges of the document we want to scan.

  • Step 2: Using these edges, find the contour(outline) representing the piece of the document being scanned.

  • Step 3: Apply a perspective transform to obtain the top-down view of the document.

  • Step 4: Using pytesseract to extract text from the scanned image.

  • Step 5: Apply regex to identify only the email and phone number in the extracted text and save the output.
  • Here is the link to the article: https://datascienceplus.com/how-to-extract-email-phone-number-from-a-business-card-using-python-opencv-and-tesseractocr/

    You can download the source code to this blog post here: My Github repository.

    Follow me LinkedIn: Mrinal Walia
    Follow me on Github: Mrinal Walia

    Top comments (0)