DEV Community

Cover image for The Complete Guide to Amazon Bedrock for Generative AI
Matteo Depascale for AWS Community Builders

Posted on • Originally published at

The Complete Guide to Amazon Bedrock for Generative AI

Revolutionize your generative AI development with Amazon Bedrock. Benefit from the top-notch security measures, compliance standards, and reliable infrastructure provided by Amazon Bedrock. See what's going on from a Solution Architect perspective.


I had early access to Amazon Bedrock, and I'm excited to share my experience with you in this blog post. As a user, I've always been impressed with the power and versatility of generative AI, and here I hope to help others better understand Amazon Bedrock.
Over the course of the preview, I provided tons of feedback to AWS regarding the use of Amazon Bedrock, its SDK, and their models. This means that I already have a decent amount of experience with this service 😏.

Here's a sneak peek at what I'll cover:

  • What is Amazon Bedrock?
  • Features
  • Pricing
  • Security
  • Compliance
  • Resilience
  • Monitoring
  • Using Amazon Bedrock
  • Fine-tuning
  • Inference parameters

Buckle up because we're about to embark on an journey into Amazon Bedrock!

What is Amazon Bedrock?

Amazon Bedrock is a fully managed service that provides access to foundation models (FMs) created by Amazon and third-party model providers through an API.

⚠️ Bedrock has a few other features, such as Agents and Embeddings. I'll cover them in part 2 when I gain hands-on experience with them. (Perhaps in part 3 or 4, because as I'm exploring GenAI offerings on AWS, I discover more topics to talk about.👀)

Features of Amazon Bedrock

As of the time I'm writing this, Amazon Bedrock offers the following capabilities:

  • Text playground - A text generation application in the AWS console;
  • Image playground - An image generation application in the AWS console;
  • Amazon Bedrock API - Explore it using the AWS CLI or utilize the API to access the foundation models;
  • Embeddings - Generate embeddings from the Titan text embedding model using the API;
  • Model fine-tuning - Create a training dataset and fine-tune an Amazon Bedrock model;
  • Agents - Although still in preview, agents can execute complex tasks by dynamically invoking APIs. Amazon Bedrock currently supports a wide range of models. I won't list them all here because I have a hunch that I'll need to update this list every few months.

Pricing of Amazon Bedrock

This time, AWS has made pricing quite straightforward for us. The pricing structure is based on the specific model and its provider. To find the pricing details, you can navigate to the AWS console section labeled "Providers" (or you can consult the documentation, although we both know it can be a bit intimidating 😜).

Looking at an example of pricing from the Stable Diffusion XL by

  • $0.018/image (step size <= 50, resolution < 512 x 512)
  • $0.036/image (step size 51-150, resolution < 512 x 512)
  • $0.036/image (step size <= 50, resolution > 512 x 512)
  • $0.072/image (step size 51-150, resolution > 512 x 512)

As we can see, the pricing follows AWS' well-known model of On-Demand pricing, which means that the cost for generating a single image can range from $0.018 to $0.072.
Additionally, you have the option to request "Provisioned throughput" for a model, guaranteeing you a specified level of throughput at a fixed cost. For instance, you can pay $35 per hour with a 6-month commitment for the Claude model.

Security of Amazon Bedrock

Security has been a hot topic since the birth of Generative AI🔥. From the beginning, AWS states that security is a shared responsibility between us and them. Essentially, they manage the servers, so we don't need to do anything, but we are responsible for encrypting traffic in transit, at rest, and so on. You can find all the details here:

Amazon Bedrock uses encryption to protect data at rest and in transit. For data at rest, it encrypts prompts and responses using service-managed keys (similar to S3/RDS, etc.), while encryption in transit is secured using TLS 1.2. Additionally, we can use our own KMS key.

AWS handles most security aspects; we only need to set up the proper role before calling the Amazon Bedrock API. The service integrates fully with IAM, similar to other AWS services. A few things to note about the IAM integration:

  • It doesn't have a resource-based policy (e.g., like specifying a bucket policy in S3).
  • It partially supports Attribute-based access control (ABAC) using tags in policies, allowing us to restrict service access based on AWS tags.
  • Using policies, we can easily allow or restrict access to our model. This is useful, especially for multi-tenant use cases.

Compliance of Amazon Bedrock

All of our data remains private within our AWS account, and, most importantly, it's not shared with third-party model providers, nor is it used by AWS itself (IMH this should be the default behavior for every provider).

Additionally, you have the option to configure a VPC endpoint to establish a private connection to Amazon Bedrock over the AWS network, ensuring private connectivity.

Furthermore, you can use CloudWatch to track usage and CloudTrail to monitor API activity, enhancing your control and visibility over the service.

Regarding compliance, Amazon Bedrock aligns with common standards such as GDPR and HIPAA, meeting expected compliance requirements.
AWS has met my expectations, and at this point, I'm hopeful that they will expand the service to more regions.

Resilience of Amazon Bedrock

Amazon Bedrock is a fully managed service, which means we can use it without worrying about its infrastructure. It is automatically patched and is highly available by default across all Availability Zones (AZs) distributed throughout their region. In the event of any issues, AWS provides notifications through the Service Health Dashboard.


To be honest, I was expecting the Request Per Second (RPS) to be higher than they actually are. Currently, Anthropic Claude V2 offers a little over 1.5 RPS, while Amazon Titan Express provides 66.6 RPS. However, it's worth noting that these quotas are region-based. This means that if we require additional RPS, we could deploy Bedrock in other regions or accounts and utilize the combination of multiple Bedrock.

⚠️ Of course, this solution does come with several implications. For instance, if you are using your custom model in account/region A, can account/region B gain access to it?

Monitoring Amazon Bedrock

We can monitor Amazon Bedrock using Amazon CloudWatch, which collects data and provides near real-time metrics that can also be graphed within the service console. Additionally, CloudWatch enables us to create alarms that trigger when specific thresholds are reached. With these alarms, we have the flexibility to take various actions, such as sending an email or blocking access, to address potential issues.

When looking at the metrics, there are several interesting ones:

  • Invocations - personally, I find limited use for this metric. It could be more valuable if we could differentiate metrics based on more specific criteria, but for now, customization is something we need to handle independently;
  • InvocationLatency - this metric is valuable for monitoring the performance of our GenAI applications. However, it's important to note that it's a global metric, which means it aggregates data for all GenAI applications, potentially affecting accuracy;
  • InvocationClientErrors - this metric is essential for identifying issues when our GenAI applications encounter problems from our end;
  • InvocationServerErrors - this metric triggers whenever AWS experiences errors. Since Amazon Bedrock is a managed service, the primary purpose of this metric is to prompt the opening of a support case 😂;
  • InvocationThrottles - this one is self explaining.

Additionally, AWS offers the "Model invocation logging" feature in preview. This feature collects invocation logs, model input data, and output data for all invocations, sending them to CloudWatch Logs and S3.

Lastly, you can configure an EventBridge event to receive notifications for actions within Bedrock, such as when a job stops running.

⚠️ Before we begin using Bedrock, it's important to grant access to models within Bedrock. To do so, we can simply go to "Model Access/Edit" and select the models to use.

Using Amazon Bedrock

Finally, we can dive into Amazon Bedrock. Before I begin, please note that I won't be demonstrating how to use the service from the AWS CLI or the AWS Console. Instead, I'll focus on the AWS SDK and provide a few lines of code to illustrate how the service functions.
Let's get started!
First, we'll examine an example of how to use Amazon Bedrock with the Python boto3 SDK. To utilize Amazon Bedrock, we need to specify the region (keep in mind that it's not available in every region, so it's essential to check availability first), the model ID, and, of course, the prompt.

Here's an example:

  import boto3
import json
bedrock = boto3.client(service_name='bedrock-runtime')

body = json.dumps({
    "prompt": "\n\nHuman:explain black holes to 8th graders\n\nAssistant:",
    "max_tokens_to_sample": 300,
    "temperature": 0.1,
    "top_p": 0.9,

modelId = 'anthropic.claude-v2'
accept = 'application/json'
contentType = 'application/json'

response = bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)

response_body = json.loads(response.get('body').read())

# text
Enter fullscreen mode Exit fullscreen mode

Which gives the output below 👇

Bedrock API result

Additionally you can also stream the response:

import boto3
import json

bedrock = boto3.client(service_name='bedrock-runtime')

body = json.dumps({
    'prompt': '\n\nHuman:write an essay for living on mars in 1000 words\n\nAssistant:',
    'max_tokens_to_sample': 100

response = bedrock.invoke_model_with_response_stream(

stream = response.get('body')
if stream:
    for event in stream:
        chunk = event.get('chunk')
        if chunk:
Enter fullscreen mode Exit fullscreen mode

Here, you can observe that the output is streamed chunk by chunk, and in this case, there were 3 chunks. 👇

Bedrock API result in streaming

I'd like to highlight a small detail: each model has its own specific requirements. For instance, the amazon.titan model requires the inputText parameter, while if you're using anthropic.claude-instant-v1, you need to set prompt and max_tokens_to_samples values to receive the output. To get started, you can explore examples directly within the Amazon Bedrock console, as they are highly informative.

As far as I'm aware, it would have been convenient if the SDK had standardized interfaces grouping all LLMs. Unfortunately, it appears that we need to handle this ourselves using LangChain.

Fine-tuning Amazon Bedrock

Now, let's delve into the process of fine-tuning a generic Foundation Model to align it with our specific dataset.

To fine-tune our model, we require a training dataset in JSONL format, structured like this:

  {"input": "[OUR_PROMPT],"output": "[OUR_EXPECTED_RESPONSE]}
  {"input": "[OUR_PROMPT_2],"output": "[OUR_EXPECTED_RESPONSE_2]}
  {"input": "[OUR_PROMPT_3],"output": "[OUR_EXPECTED_RESPONSE_3]}
Enter fullscreen mode Exit fullscreen mode

To perform the actual fine-tuning, you can use the script below 👇

Before running it, please remember to change the role ARN. You need an actual role capable of reading and writing on S3 and using Amazon Bedrock. Moreover, you need to specify the S3 bucket from which the data will be retrieved.

  import boto3
  import json
  import calendar
  import time

    bedrock = boto3.client(service_name="bedrock", region_name="us-east-1")
  current_GMT = time.gmtime()

  # required
  jobName = "job-" + str(calendar.timegm(current_GMT))
  customModelName = "[CUSTOM_NAME]-" + str(calendar.timegm(current_GMT))
  roleArn = "arn:aws:iam::[YOUR_ACCOUNT_ID]:role/[YOUR_ROLE_NAME]"
  baseModelIdentifier = (
  trainingDataConfig = {"s3Uri": "s3://[YOUR_BUCKET_NAME]/[YOUR_TRAINING_DATASET_FILE_PATH]"}
  outputDataConfig = {"s3Uri": "s3://[YOUR_BUCKET_NAME]/output/dataset.json"}
  hyperParameters = {
    "epochCount": "1",
    "batchSize": "1",
    "learningRate": "0.005",
    "learningRateWarmupSteps": "0",

  # optional
  clientRequestToken = ""
  jobTags = [{"key": "bedrock", "value": "true"}]
  customModelTags = [{"key": "bedrock", "value": "true"}]
  validationDataConfig = {
    "validators": [
        "name": "bedrock-validator",

  response = bedrock.create_model_customization_job(
    # jobTags=jobTags,
    # customModelTags=customModelTags,
    # validationDataConfig=validationDataConfig,
Enter fullscreen mode Exit fullscreen mode

If you have all the necessary permissions and files set up correctly, you can simply wait, and wait…………….. still waiting?………… Hopefully there are no errors and you are still waiting😜. While you wait, you can check the status directly in the AWS Console or with the AWS SDK.
t some point in the future, the model will complete its training, and then we can start using it. You can interact with it via the SDK or Console. Typically, there's a brief waiting period when running the first prompt because the model needs to load up. However, once it's ready, you can submit queries, and this time it will respond based on our training data.

Inference parameters

To generate high quality and accurate responses, we need to tune the parameters. There are a lot of them. I'll explain the most common ones I found in Bedrock so we can hit the ground running. I won't explain every bit of them, just what to expect when we change them up.

  • Temperature: controls the randomness of the generated response. Higher the value mens more randomness whereas a lower value get responses that are closer to the training data.
  • Top P: is somewhat like a filter, higher the value means it will only looks at the very best candidates for what comes next in the sentence. The value goes from 0 to 1, if we had 0.9 it means that the model will consider only the top 10% most probable tokens when generating the next token.
  • Top K: similar to Top P, but, instead of working in percentage, it specifies an absolute number of tokens like 10 or 2.
  • Presence penalty: reduces the probability of generating new tokens that have already been used in either the prompt and completion. This helps prevent constant repetition.
  • Count penalty: similar the one above but it takes int oaccount how often a token appears in both the prompt and completion. Tokens that appear more frequently will be less likely to be generated again.
  • Frequency penalty: as the two above but it considers the overall frequency given a specified text length.
  • Penalize special token: allows us to choose specific type of token (such as punctuation) that won't be subject to the penalties described above.
  • Stop sequence: it's out model handbrake, usually for a chat bot we can specify "Human:" so that it stops when it's the user turn to answer. Hopefully these hyper parameters should cover all the knowledge you need to create your own prompt and twik them up to reach the best answers as possible.


There you have it, folks! Hopefully, this blog post has set you on the right track and provided you with all the information you need to kickstart your project. I aimed to offer a Solution Architect's perspective in this post. In the next one, we'll look into real use cases and projects, so stay tuned.

If you enjoyed this article, please let me know in the comment section or send me a DM. I'm always happy to chat! ✌️

Thank you so much for reading! 🙏 Keep an eye out for more AWS related posts, and feel free to connect with me on LinkedIn 👉


Disclaimer: opinions expressed are solely my own and do not express the views or opinions of my employer.

Top comments (1)

Some comments have been hidden by the post's author - find out more