Ishwar398

Posted on Nov 6, 2023

Using Azure Doc intelligence for OCR

#azure #ocr #ai

Whenever we need to read text from a PDF File, Image, Doc file etc. we use Optical Character Recognition (OCR). With OCR, we can read a document, handwritten or typed, across all the supported formats.
Azure Document Intelligence is an AI service which embeds the intelligence of AI in performing OCR. There are many use cases of Azure Document Intelligence other than OCR, but for this post we will stick to OCR.

Creating the Azure Document Intelligence service

Search for Azure Document intelligence on the Azure portal and the click on Create.
Fill in the details like the Subscription, Resource Group, Region, Name and the Pricing Tier.

Performing OCR

We can perform OCR on a document using 2 ways.

Using the File URL (40MB size limitation for F0 tier)
Using the actual file (4MB size limitation for F0 tier)

Using the File URL

When we need to perform OCR on the file that's present on some storage or is hosted, we can use the URL of the file to perform OCR on it. The only requirement here is that the URL should be publicly accessible. If the URL is not publicly available, Document Intelligence will not be able to read it.
When using this way, the POST request body should contain the URL of the file.

{ 
   'urlSource': 'URL_OF_THE_FILE'
}

Along with this, the header value for Content-Type should be as follows:

Content-Type: "application/json"

Using the actual file

Now, if we need to send the file directly to the Document intelligence service, we can use this way.
Here, the POST request body will contain the file, and the header value for Content-Type will be as follows:

Content-Type: "application/octet-stream"

Document Intelligence in action for doing OCR

Once everything is setup, we can use the Document Intelligence service using REST API. There are other options available as well. But for this post, we will be focusing on REST API.
Getting the OCR results from Document Intelligence is a two step process.
The first step is to upload the file using any of the desired way i.e. either by sending a file directly or by providing the File URL.
This will provide us with the Result ID.

The second step is to use the Result ID to get the results.

First. let's try using the File URL.
I'll be using Postman to call the API endpoints.

Step 1:
We need to send the document for analysis. The API endpoints should be as follows:

{endpoint}/formrecognizer/documentModels/{modelID}:analyze?api-version=2023-07-31

Endpoint: The endpoint provided on the Azure portal for the Document Intelligence service
modelID: prebuilt-document (using this model for OCR)

Step 2: Setting up the headers
Apart from the normal headers, we need to add two headers.

Ocp-Apim-Subscription-Key: Get this key from the Azure portal
Content-Type: "application/json"

Body:
Currently, I'm considering a dummy hosted PDF file

{ 
   "urlSource": "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
}

Send this as a POST request. If everything is correct, you'll get a 202 Accepted response. Check the response header. You'll get a apim-request-id in the headers.
Copy this request id.

Step 3: Getting the results
To Get the results for OCR, we need to make a GET request.

https://{endpoint}/formrecognizer/documentModels/{modelId}/analyzeResults/{resultId}?api-version=2023-07-31

Endpoint: your azure document intelligence service endpoint
modelID: prebuilt-document
resultId: apim-request-id from the first step

Make this GET request. If everything is correct, you'll get a 200OK response along with the content of the PDF file.

Using the actual file

Now, if we need to send the file to the service instead of the File URL, only 2 things will change in the Step 1.

Content-Type: "application/octet-stream"

In the body, instead of the File URL, we need to send the actual file.

Send the request, if everything is correct you'll get the apim-request-id, which can then be used in similar way.

DEV Community

Using Azure Doc intelligence for OCR

Creating the Azure Document Intelligence service

Performing OCR

Using the File URL

Using the actual file

Document Intelligence in action for doing OCR

Using the actual file

Top comments (0)

Read next

🚀 Introducing Chromate: Build AI Agents Easily with Ruby and the Chrome DevTools Protocol (CDP)

A beginner's guide to the Stable-Diffusion-V1-4 model by Compvis on Huggingface

10 Mind-Blowing AI Predictions That Will Change Our World by 2030

Large Language Models (LLMs)