DEV Community

Cover image for Demystifying Azure Form Recognizer

Demystifying Azure Form Recognizer

Introduction

Every industry has made a significant transformation in the recent decade. And it’s looking for even more efficient and optimized work. We live in a generation where data plays a major role in various domains, and everyone is aware of the importance of data.
Organizations will receive or collect data in a various way. But they need to keep all the information in one place to access it efficiently and swiftly. Most sectors need to extract the data from the source they receive. And they know it would be a repetitive task and require more effort every time.
Therefore, organizations need to find a solution that reduces human error in extractions and increases efficiency in extracting the data from the input sources.

Image description

Quick Fix:

To solve the above problems in data extraction from structured and semi-structured documents automatically, we can use Azure Form Recognizer.

Form Recognizer:

Form recognizer is a cognitive service that uses Machine Learning technology to identify and extract the required data.

Image description

Form Recognizer uses deep learning models and enables us to train with custom sample models to fetch the details we required.

Get Started:

Prerequisites:

  • To get started with the Form Recognizers, we will need the below.
  • Python 3.7 or later is required to use this package.
  • You must have an Azure subscription and a Cognitive Services or Form Recognizer resource to use this package.
  • Azure Form Recognizer client library for Python.

Create Form Recognizers:

  • Login into the Azure portal and search for Form Recognizers and create one.

Image description

  • Select the Subscriptions and Resources Group (create one)
  • Select region as the closet region.
  • Provide a name for the Form Recognizer.
  • Select pricing tier.
  • Review and create it.

Image description

Get Keys and Endpoints:

  • Open the created form recognizers.
  • Check for keys and endpoints on the right side.

Image description

Get ready with the input files:

  • Sample documents which we used for this demo

Image description

Python SDK:

  • Create a virtual environment and install the Azure module.
pip install azure-ai-formrecognizer
Enter fullscreen mode Exit fullscreen mode
  • Save the keys and endpoint in a config file to call the API from Python.
  • Install required modules and libraries and import.
from azure.ai.formrecognizer import DocumentAnalysisClient
from azure.core.credentials import AzureKeyCredential

config = json.load(open('config.json'))
api_key = config['APIKEY']
endpoint = config['APIENDPOINT']
credential = AzureKeyCredential(api_key)
Enter fullscreen mode Exit fullscreen mode
  • Once we get the input file, we have to parse the input file and pass it to the API along with the credentials.
with open("input_files/test_file.pdf", "rb") as fd:
    document = fd.read()

poller = document_analysis_client.begin_analyze_document("prebuilt-layout", document)
result = poller.result()
Enter fullscreen mode Exit fullscreen mode
  • We will be receiving the result in azure.ai.formrecognizer type.
  • We can parse over the result and get the data we need
for table_idx, table in enumerate(result.tables):
    print(
        "Table # {} has {} rows and {} columns".format(
            table_idx, table.row_count, table.column_count
        )
    )
    for region in table.bounding_regions:
        print(
            "Table # {} location on page: {} is {}".format(
                table_idx,
                region.page_number,
                region.polygon
            )
        )
    for cell in table.cells:
        print(
            "...Cell[{}][{}] has content '{}'".format(
                cell.row_index,
                cell.column_index,
                cell.content,
            )
        )
Enter fullscreen mode Exit fullscreen mode
  • This will give the data of each and every cell in the table

Image description

Other Functions:

for page in result.pages:
    # prints data in each and every pages in the file
    # prints page height, width
    # syntax - page.width, page.height, page.unit
    pass
    for line in page.line:
        # prints each and every lines in a page        
        # syntax - line.context
        pass

    for word in page.word:
        # print all words along with its positions and confidence score
        # syntax - word.context, word.confidence, word.position
        pass
Enter fullscreen mode Exit fullscreen mode

Key Value Pair Extraction:

  • Similar to table extraction Form Recognizer will help in extracting the key value pair data from document as shown below

Image description

Image from Microsoft Azure Cognitive Services Demos

Different Models available:

How to select a model:

Document type Data to Extract Best Model
A generic document like a contract or letter. You want to extract primarily text lines, words, locations, and detected languages. Read OCR model
A document that includes structural information like a report or study. In addition to text, you need to extract structural information like tables, selection marks, paragraphs, titles, headings, and subheadings. Layout analysis model
A structured or semi-structured document that includes content formatted as fields and values, like a credit application or survey form. You want to extract fields and values including ones not covered by the scenario-specific prebuilt models without having to train a custom model. General document model
U.S. W-2 form You want to extract key information such as salary, wages, and taxes withheld from US W2 tax forms. W-2 model
Invoice You want to extract key information such as customer name, billing address, and amount due from invoices. Invoice model
Receipt You want to extract key information such as merchant name, transaction date, and transaction total from a sales or single-page hotel receipt. Receipt model
Identity document (ID) like a passport or driver's license. You want to extract key information such as first name, last name, and date of birth from US drivers' licenses or international passports. Identity document (ID) model
Business card You want to extract key information such as first name, last name, company name, email address, and phone number from business cards. Business card model
Mixed-type document(s) You want to extract key-value pairs, selection marks, tables, signature fields, and selected regions not extracted by prebuilt or general document models. Custom model

File/Document formats:

PDF, Images, TIFF files can be used in the Form Recognizer.

Limitations:

  • Form Recognizer doesn’t have a pre-build model for generic form extraction, if we need to get the form data from a document which is not in English we need to train the model
  • Total size of training data set must be less than 500 pages
  • We can pass a file(PDF, TIFF) of size upto 500MB and 2000 pages.
  • Page dimensions can be upto 10k x 10k pixles for Images and 17 x 17 for PDFs
  • Extraction may fail if the table contains only one column.

Programming Languages:

Form recognizers supports below programming languages with the SDK and libraries

  • Python
  • Java
  • C#
  • Javascript

Conclusion:

This data extractions process can be streamlined with the help of the inclusion of AI/ML in organizations will help in reduce error and increase the efficiency of work.
For a larger organization, it could be a quick process with a more accurate level of extraction with high efficiency helps to focus on the next level in pipelines.
Along with Azure Form Recognizers, we have other services/tools like Instabase, AWS Textract are also highly effective tools which is available in the market.

References:

Disclaimer:

This is a personal blog. The views and opinions expressed here are only those of the author and do not represent those of any organization or any individual with whom the author may be associated, professionally or personally.

Top comments (0)