DEV Community

Mudathir Lawal
Mudathir Lawal

Posted on

A step by step guide to processing PDF files using Amazon Comprehend for IDP

Intro

One of the new service features that was announced at the AWS re:Invent 2022 is the intelligent document processing (IDP) capability of Amazon Comprehend which allows it to process semi-structured files such as PDFs. This article seeks to provide a step-by-step demonstration of the process. The use case we adopt is that of automating legal contracts which involves extracting key phrases from a pdf document and use that as a guide to prepare a favourable negotiation.

Demo

  1. Log on to the AWS console and create an s3 bucket to hold the documents you want to process. We recommend that you create two separate folders within the s3 bucket,one of which is to store the input documents awaiting processing, while the other will be used to store the output of the API call to Amazon Comprehend. Note the region in which the s3 bucket is located.
  2. Access the Amazon Comprehend service at https://console.aws.amazon.com/comprehend/ and select the region where you situated your s3 bucket. This is important as the two services would not communicate if not located in the same region.
  3. After clicking on "Launch Amazon Comprehend," choose "Analysis jobs," on the left pane, then select "Create job."

Image description

  1. Under "Analysis types" click "Key phrases."

Image description
Enter the paths to the input and output folders already created in your s3 bucket in the appropriate fields.

  1. Under "Access permissions," choose "Create an IAM role," then add a suitable name suffix.

Image description

  1. Click "Create job."
  2. The completed job should look like this:

Image description

Image description

  1. To download the output file, navigate to the output folder in your s3 bucket. You will need to extract the file and resave it in json format.

Image description

Winding up

I hope this has been a useful piece. Watch out for more interesting contents AWS and DevOps coming your way soon. Happy clouding!

Top comments (0)