Jhon Robert Quintero Hurtado for AWS Community Builders

Posted on May 20, 2024 • Originally published at analisys.co

GenAI: Using Amazon Bedrock, API Gateway, Lambda and S3

#aws #serverless #ai #bedrock

Introduction

I was thrilled to be a speaker at the XIII Meeting SOFTHARD 2024, hosted by the Faculty of Engineering of the Institución Universitaria Antonio José Camacho . Here's a snapshot of the key takeaways from my presentation on Generative AI, focusing on Amazon Bedrock, API Gateway, Lambda, and S3. Buckle up for an exciting ride!

Amazon Bedrock

Picture this: Amazon Bedrock is your magic wand for building and scaling generative AI applications with foundation models. It's like having a fully managed AI factory at your fingertips!

Fully Managed Service: Say goodbye to the nitty-gritty of managing infrastructure.
Choose Your Model: It’s like a buffet of the best AI models from AI21 Labs, Anthropic, Cohere, Meta, and Stability. Just pick what you need!
Customize with Your Data: Make your model truly yours.

For instance, Anthropic’s Claude model is the hotshot for tasks like summarization and complex reasoning, while Stability AI’s Stable Diffusion model is your go-to for generating unique images, art, and designs. Need copywriting help? Cohere Command's got your back!

Anthropic’s Claude 3 Model

Meet Claude 3, the genius in the room! It's smarter, faster, and cheaper than GPT 3.5T, and it’s got vision, literally!

With Claude 3, you can do:

Dialogue and role-play
Summarization and Q&A
Translation
Database querying and retrieval
Coding-related tasks
Classification, metadata extraction, and analysis
Text and content generation
Content moderation

Amazon’s Titan Model

The Titan model is another powerhouse. Here’s what it can do:

Text Embeddings: Converts text into numerical form for searching, retrieving, and clustering. Think of it as translating words into a secret numerical language.
Image Generator: Create stunning images with just a few words. It’s like having a personal artist at your service!

Customizing Amazon Titan models

You can now customize foundation models (FMs) with your own data in Amazon Bedrock to build applications that are specific to your domain, organization, and use case. With custom models, you can create unique user experiences that reflect your company’s style, voice, and services.

With fine-tuning, you can increase model accuracy by providing your own task-specific labeled training dataset and further specialize your FMs. You can train models using your own data in a secure environment with customer-managed keys. Continued pre-training helps models become more specific to your domain.

Knowledge Bases for Amazon Bedrock

It lets you put text documents, like articles or reports, into a knowledge base. It also automatically creates vector representations of text documents, which are called embeddings. These embeddings can be used for retrieval-augmented generation. This is a key feature of the Amazon Bedrock service. It lets you use your own data to enhance foundation models.

Stores embeddings in your vector database (Amazon OpenSearch). Retrieves embeddings and augments prompts.

Solution Design: Text Summarization

Let’s dive into the cool stuff! Here’s how I designed a text summarization solution using Bedrock, API Gateway, Lambda, and S3.

Bedrock: Our star player.
API Gateway: Acts as the doorman, directing client requests to the right Lambda function.
Lambda Function: Processes the text and sends it to Bedrock for summarization.
S3 Bucket: Stores the summarized text.

Check out the source code below. It’s like a recipe for your favorite dish – just follow the steps and enjoy the result!

Python Source Code

import boto3
import botocore.config
import json
import base64
from datetime import datetime
from email import message_from_bytes


def extract_text_from_multipart(data):
    msg = message_from_bytes(data)

    text_content = ''

    if msg.is_multipart():
        for part in msg.walk():
            if part.get_content_type() == "text/plain":
                text_content += part.get_payload(decode=True).decode('utf-8') + "\n"

    else:
        if msg.get_content_type() == "text/plain":
            text_content = msg.get_payload(decode=True).decode('utf-8')

    return text_content.strip() if text_content else None


def generate_summary_from_bedrock(content:str) ->str:
    prompt = f"""Summarize the following meeting notes: {content} """

    body = json.dumps({"inputText": prompt, 
                       "textGenerationConfig":{
                           "maxTokenCount":4096,
                           "stopSequences":[],
                           "temperature":0,
                           "topP":1
                       },
                      }) 

    modelId = 'amazon.titan-tg1-large' # change this to use a different version from the model provider
    accept = 'application/json'
    contentType = 'application/json'
        
    try:
        boto3_bedrock = boto3.client('bedrock-runtime')
        response = boto3_bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
        response_body = json.loads(response.get('body').read())
    
        summary = response_body.get('results')[0].get('outputText')
        return summary

    except Exception as e:
        print(f"Error generating the summary: {e}")
        return ""

def save_summary_to_s3_bucket(summary, s3_bucket, s3_key):

    s3 = boto3.client('s3')

    try:
        s3.put_object(Bucket = s3_bucket, Key = s3_key, Body = summary)
        print("Summary saved to s3")

    except Exception as e:
        print("Error when saving the summary to s3")


def lambda_handler(event,context):

    decoded_body = base64.b64decode(event['body'])

    text_content = extract_text_from_multipart(decoded_body)

    if not text_content:
        return {
            'statusCode':400,
            'body':json.dumps("Failed to extract content")
        }


    summary = generate_summary_from_bedrock(text_content)

    if summary:
        current_time = datetime.now().strftime('%H%M%S') #UTC TIME, NOT NECCESSARILY YOUR TIMEZONE
        s3_key = f'summary-output/{current_time}.txt'
        s3_bucket = 'bedrock-analisys-co'

        save_summary_to_s3_bucket(summary, s3_bucket, s3_key)

    else:
        print("No summary was generated")


    return {
        'statusCode':200,
        'body':json.dumps("Summary generation finished")
    }

Text Generation Configuration

Understanding the textGenerationConfig parameters is key for tweaking the model's behavior:

maxTokenCount:

This sets the maximum number of tokens (words or subwords) that the text generation model can produce. It helps control the length of the generated text.

Think of this as setting the maximum length for a speech. Just like a speechwriter might say, "Your speech should be no longer than 500 words," the maxTokenCount controls the length of the generated text, ensuring it doesn’t run on indefinitely.

stopSequences:

This is a list of sequences (e.g., newline characters) that, if encountered during text generation, will cause the generation to stop.

Imagine you’re listening to a song that has a specific note that signals the end, like a grand finale. Similarly, stopSequences act like these finishing notes; they are predefined sequences that, when detected, tell the model to stop generating more text.

temperature:

This parameter controls the "creativity" or "randomness" of the text generation. A lower temperature (e.g., 0) will result in more conservative, predictable text, while a higher temperature (e.g., 1) will produce more diverse and potentially more creative text.

Picture a painter with a palette of colors. A lower temperature is like having only a few colors to choose from, resulting in a more predictable and uniform painting. A higher temperature is like having a wide variety of colors, leading to a more vibrant and unexpected artwork.

topP:

This is a technique called "nucleus sampling" that limits the text generation to the most likely tokens, based on the model's probability distribution. A value of 1 means no filtering, while a lower value (e.g., 0.9) will restrict the generation to the top 90% of the most likely tokens.

Imagine a buffet where you want to sample the most popular dishes. A topP value of 1 means you can choose from the entire spread. A lower topP value, like 0.9, means you’re only selecting from the top 90% of the most popular dishes, skipping the least likely choices to ensure a satisfying and high-quality meal.

These parameters let you fine-tune the model’s behavior to suit your needs. Feel free to experiment and find the perfect balance!

Video Demo

Stay tuned for the video demo where I'll walk you through the entire process. Seeing is believing, right?

Top comments (1)

Jason Dunn [AWS] • May 21 '24

Solid article!

DEV Community