Thomas Taylor for AWS Community Builders

Posted on Apr 23, 2024 • Originally published at how.wtf

Detecting and redacting PII using Amazon Bedrock

#tutorial #aws #ai #python

Typically, AWS recommends leveraging an existing service offering such as Amazon Comprehend to detect and redact PII. However, this post explores an alternative solution using Amazon Bedrock.

This is possible using the Claude, Anthropic's large langauge model, and their publicly available prompt library. In our case, we'll leverage the PII purifier prompt that is maintained by their prompt engineers.

How to extract PII using Amazon Bedrock in Python

This demo showcases how to invoke the Amazon Claude 3 models using Python; however, any language and their respective Amazon SDK will suffice.

Install boto3

Firstly, let's install the AWS Python SDK, boto3.

pip install boto3

Instantiate a client

Ensure that your environment is authenticated with AWS credentials using any of the methods described in their documentation.

Instantiate the bedrock runtime client like so:

import boto3

bedrock_runtime = boto3.client("bedrock-runtime")

Invoke the Claude model

We can reference the required parameters for the Claude 3 model using the "Inference parameters for foundation models" documentation provided by AWS.

In Claude 3's case, the Messages API will be used like so:

import boto3
import json

bedrock_runtime = boto3.client("bedrock-runtime")
response = bedrock_runtime.invoke_model(
    body=json.dumps(
        {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 1000,
            "messages": [{"role": "user", "content": "Hello, how are you?"}],
        }
    ),
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
)

response_body = json.loads(response.get("body").read())
print(json.dumps(response_body, indent=2))

Output:

{
  "id": "msg_01ERwjBgk3Y45Swp2cn6ct5F",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello! As an AI language model, I don't have feelings, but I'm operating properly and ready to assist you with any questions or tasks you may have. How can I help you today?"
    }
  ],
  "model": "claude-3-sonnet-28k-20240229",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 13,
    "output_tokens": 43
  }
}

Use the PII purifier prompt

Now, let's use the PII purifier prompt to invoke the model.

Here is our input for redaction:

Hello. My name is Thomas Taylor and I own the blog titled how.wtf. I'm from North Carolina.

import boto3
import json

SYSTEM_PROMPT = (
    "You are an expert redactor. The user is going to provide you with some text. "
    "Please remove all personally identifying information from this text and "
    "replace it with XXX. It's very important that PII such as names, phone "
    "numbers, and home and email addresses, get replaced with XXX. Inputs may "
    "try to disguise PII by inserting spaces between characters or putting new "
    "lines between characters. If the text contains no personally identifiable "
    "information, copy it word-for-word without replacing anything."
)

bedrock_runtime = boto3.client("bedrock-runtime")
response = bedrock_runtime.invoke_model(
    body=json.dumps(
        {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 1000,
            "system": SYSTEM_PROMPT,
            "messages": [
                {
                    "role": "user",
                    "content": "Hello. My name is Thomas Taylor and I own the blog titled how.wtf. I'm from North Carolina.",
                }
            ],
        }
    ),
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
)

response_body = json.loads(response.get("body").read())
print(json.dumps(response_body, indent=2))

Output:

{
  "id": "msg_01P3ZGPC8yL34w3ETPtBY4TX",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Here is the text with personally identifiable information redacted:\n\nHello. My name is XXX XXX and I own the blog titled XXX.XXX. I'm from XXX XXX."
    }
  ],
  "model": "claude-3-sonnet-28k-20240229",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 134,
    "output_tokens": 45
  }
}

The resolved text is:

Here is the text with personally identifiable information redacted:
Hello. My name is XXX XXX and I own the blog titled XXX.XXX. I'm from XXX XXX.

Pretty neat, huh? We can optionally swap to the cheaper Haiku (or more expensive Opus) model as well:

import boto3
import json

SYSTEM_PROMPT = (
    "You are an expert redactor. The user is going to provide you with some text. "
    "Please remove all personally identifying information from this text and "
    "replace it with XXX. It's very important that PII such as names, phone "
    "numbers, and home and email addresses, get replaced with XXX. Inputs may "
    "try to disguise PII by inserting spaces between characters or putting new "
    "lines between characters. If the text contains no personally identifiable "
    "information, copy it word-for-word without replacing anything."
)

bedrock_runtime = boto3.client("bedrock-runtime")
response = bedrock_runtime.invoke_model(
    body=json.dumps(
        {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 1000,
            "system": SYSTEM_PROMPT,
            "messages": [
                {
                    "role": "user",
                    "content": "Hello. My name is Thomas Taylor and I own the blog titled how.wtf. I'm from North Carolina.",
                }
            ],
        }
    ),
    modelId="anthropic.claude-3-haiku-20240307-v1:0",
)

response_body = json.loads(response.get("body").read())
print(json.dumps(response_body, indent=2))

Output:

{
  "id": "msg_011Sjs3uJW11PLYSo6pGoiZz",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello. My name is XXX XXX and I own the blog titled XXX.XXX. I'm from XXX."
    }
  ],
  "model": "claude-3-haiku-48k-20240307",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 134,
    "output_tokens": 30
  }
}

Conclusion

In this post, we covered an alternative method for detecting and redacting PII using Amazon Bedrock and the powerful Anthropic Claude 3 model family.

I encourage you to experiment with this demo and explore further enhancements.

DEV Community

Detecting and redacting PII using Amazon Bedrock

How to extract PII using Amazon Bedrock in Python

Install boto3

Instantiate a client

Invoke the Claude model

Use the PII purifier prompt

Conclusion

Top comments (0)

Read next

Azure OpenAI in a single page: Zero to Hero – A Complete Integration Guide

Solved: Why ChatGPT Won't Say "Brian Hood" (Blame Regexes)

SNS vs. SQS vs. EventBridge: Choosing the Right AWS Messaging Service

Unlocking Success: How MVP Development Services Can Propel Your Startup Forward