DEV Community

Mastering NLP with AWS Comprehend and Python

AWS Comprehend is a natural language processing (NLP) service that uses machine learning to analyze text and extract insights. It can perform tasks such as named entity recognition, key phrase extraction, sentiment analysis, topic modeling, and language detection.

In this article, we will use Python and the AWS SDK for Python (Boto3) to interact with AWS Comprehend and perform some common NLP tasks. We will need an AWS account and an IAM user with the appropriate permissions to access AWS Comprehend.

Overview of NLP tasks

Named Entity Recognition

Named entity recognition (NER) is the process of identifying and categorizing entities in a text, such as people, places, organizations, dates, etc. AWS Comprehend can recognize up to 12 types of entities, and return their text, type, and score.

Key Phrase Extraction

Key phrase extraction is the process of identifying and extracting the most important phrases in a text, such as nouns and noun phrases. AWS Comprehend can return the text and the score of the key phrases.

Sentiment Analysis

Sentiment analysis is the process of determining a text's emotional tone or attitude, such as positive, negative, neutral, or mixed. AWS Comprehend can return the overall sentiment of the text, as well as the scores of each sentiment category.

Getting started

From your AWS Console go to Security Credentials and create a user attaching the ComprehendReadOnly permission policy.

Once the user is created go to Security Credentials for this user and create an access key. Copy the access and secret keys.

Now, use these keys in the Python Notebook:

import boto3

boto_session = boto3.Session(
    aws_access_key_id='YOUR_ACCESS_KEY',
    aws_secret_access_key='YOUR_SECRET_KEY',
    region_name='us-west-2')

comprehend_client = boto_session.client('comprehend')

text = "Nestled in Kyoto, the Kinkaku-ji Temple, covered in shimmering gold leaf, exudes cultural richness and serenity. Visitors find profound beauty in this iconic site, appreciating its historical significance and tranquil surroundings."

# Perform Named Entity recognition
response = comprehend_client.detect_entities(Text=text, LanguageCode='en')

# Display the detected entities
print("Detected Entities:")
for entity in response['Entities']:
    print(f"Type: {entity['Type']}, Text: {entity['Text']}")

# Perform the Key Phrases detection
response = comprehend_client.detect_key_phrases(
    Text= text,
    LanguageCode='en'
)

# Display the detected Key Phrases
print("Detected Key Phrases:")
for keyPhrase in response['KeyPhrases']:
    print(f"Text: {keyPhrase['Text']}, Score: {round(keyPhrase['Score'], 2) * 100} %")

# Perform the Sentiment detection
response = comprehend_client.detect_sentiment(
    Text= text,
    LanguageCode='en'
)

# Display the detected Sentiment
print("Detected Sentiments:")
main_sentiment = response['Sentiment'].capitalize()
score = response['SentimentScore'][main_sentiment]
print(f"Main Sentiment: {main_sentiment}, Score: {round(score, 2) * 100} %")
print("Other sentiments:")

sentiments = ["Positive", "Negative", "Neutral", "Mixed"]
for sentiment in sentiments:
  if sentiment == main_sentiment:
    continue

  score = response['SentimentScore'][sentiment]
  score = round(score, 2) * 100
  if score > 0:
    print(f"Sentiment: {sentiment}, Score: {score} %")
Enter fullscreen mode Exit fullscreen mode

The output is the following:

  • Named Entity Recognition
Type: LOCATION, Text: Kyoto
Type: LOCATION, Text: Kinkaku-ji Temple
Enter fullscreen mode Exit fullscreen mode
  • Key Phrase Extraction
Text: Kyoto, Score: 100.0 %
Text: the Kinkaku-ji Temple, Score: 100.0 %
Text: gold leaf, Score: 100.0 %
Text: cultural richness and serenity, Score: 99.0 %
Text: Visitors, Score: 100.0 %
Text: profound beauty, Score: 100.0 %
Text: this iconic site, Score: 100.0 %
Text: its historical significance, Score: 100.0 %
Text: tranquil surroundings, Score: 95.0 %
Enter fullscreen mode Exit fullscreen mode
  • Sentiment Analysis
Main Sentiment: Positive, Score: 66.0 %
Other sentiments:
Sentiment: Neutral, Score: 34.0 %
Enter fullscreen mode Exit fullscreen mode

Conclusion

In this article, we learned how to use AWS Comprehend with Python to perform some common NLP tasks, such as named entity recognition, keyphrase extraction, and sentiment analysis. We used the Boto3 library to create a Comprehend client and call the relevant methods with the text and the language code as parameters. We saw that AWS Comprehend can return useful and accurate information about the text, such as the entities, key phrases, and sentiment.

AWS Comprehend is a powerful and easy-to-use service that can help us analyze text and extract insights. It can be used for various applications, such as content analysis, customer feedback, social media monitoring, and more. We encourage you to explore more of its features and capabilities and see how it can enhance your projects.

Thanks for reading

Thank you very much for reading. I hope you found this article interesting and may be useful in the future. If you have any questions or ideas you need to discuss, it will be a pleasure to collaborate and exchange knowledge.

Top comments (0)