loading...
Cover image for Add Amazon Comprehend to Spring Boot Project

Add Amazon Comprehend to Spring Boot Project

balvinder294 profile image Balvinder Singh Updated on ・2 min read

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in the text. No machine learning experience required. Check out Amazon Comprehend via Link. There are cases like when you need to scan a document and extract data, so this service helps in that and automatically extracts text from any image, document.

Use Cases

  1. Bills

  2. Medical Receipts

  3. Forms

  4. Images with written text

  5. Feedback forms

  6. Tables in the document

and more using OCR based scanning and NLP processing

Steps for Integration

  1. AWS Comprehend SDK Add below dependencies to pom.xml to add AWS Comprehend Classes.
<!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-comprehend -->
<dependency>
    <groupId>com.amazonaws</groupId>
    <artifactId>aws-java-sdk-comprehend</artifactId>
    <version>1.11.759</version>
</dependency>

  1. Java Service
    Create A java service and name it as you like. let say aws-comprehendService.java and write below methods. Or use in your own service.

  2. Initialize Comprehend Client

AmazonComprehend comprehendClient() {
    log.debug("Intialize Comprehend Client");
    BasicAWSCredentials awsCreds = new BasicAWSCredentials(awsAccessKey, awsSecretKey);
    AWSStaticCredentialsProvider awsStaticCredentialsProvider = new AWSStaticCredentialsProvider(awsCreds);
    return AmazonComprehendClientBuilder.standard().withCredentials(awsStaticCredentialsProvider)
    .withRegion(awsRegion).build();
}
  1. Detect entities method Method for getting entities by Text
public List<Entity> detectEntitiesWithComprehend(String text) {
    log.debug("Method to Detect Entities With Amazon Comprehend {}", text);
    DetectEntitiesRequest detectEntitiesRequest = new DetectEntitiesRequest().withText(text).withLanguageCode("en");
    DetectEntitiesResult detectEntitiesResult = comprehendClient().detectEntities(detectEntitiesRequest);
    entitiesList = detectEntitiesResult.getEntities();
    return entitiesList;
}

Note: The text Limit for Using this way is 5000 bytes. So if you need to trim, see below method.

/***Text to trim */
text = trimByBytes(text, 5000);
String trimByBytes(String str, int lengthOfBytes) {
    byte[] bytes = str.getBytes(StandardCharsets.UTF_8);
    ByteBuffer buffer = ByteBuffer.wrap(bytes);
    if (lengthOfBytes < buffer.limit()) {
        buffer.limit(lengthOfBytes);
    }
    CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder();
    decoder.onMalformedInput(CodingErrorAction.IGNORE);
    try {
        return decoder.decode(buffer).toString();
    } catch (CharacterCodingException e) {
        // We will never get here.
    throw new RuntimeException(e);
    }
}
  1. Output Result Now we got the entities in Form of the list. The List is the list of entities processed from the text we passed. Sample output is below
[
    {
        "score": 0.4398592,
        "type": "ORGANIZATION",
        "text": "JSON",
        "beginOffset": 4930,
        "endOffset": 4934
    },
    {
        "score": 0.98848945,
        "type": "ORGANIZATION",
        "text": "Apple",
        "beginOffset": 4960,
        "endOffset": 4965
    }
]

Below is the snippet with all methods and imports. Visit the link for full code on Github

We used the synchronous method for processing now. Will add the asynchronous one next. So stay connected for more and please share. Please do share your views in the comments below.

Originally Published At Tekraze.com

Posted on by:

balvinder294 profile

Balvinder Singh

@balvinder294

Full Stack Developer and DevOps working remotely in Dehaze.io. Founder and Blogger at Tekraze.com, here to share my journey of code and experiences to help out the coders to give back to dev communiry

Discussion

pic
Editor guide