DEV Community πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’»

Passing the AWS Certified Solutions Architect - Professional exam

Introduction

I recently passed the AWS Certified Solutions Architect - Professional exam. It is a tough one and I'd like to share my thoughts on what I did to pass and some notes I kept along the way. Before beginning, it's worth considering what this exam is about.
It is an architect exam so you are not always asked the lower level implementation or configuration details of a service. You need to know more about how services integrate natively. You'll almost never be asked about a single service in a question. Nearly all questions will be about how two or more services can work together to solve a customer problem. However, two areas I did find I needed to know in-depth were for enabling cross-account access and for hybrid networking. As always with architecture, it depends.

Before you start

The professional level certs are the pinnacle of the AWS certs, the top of 3 levels, foundational, associate and professional.

AWS Certifications

It used to be recommended that you have the related associate certs before attempting the professional cert but that requirement seems to be gone. Personally I would still recommend sitting the associate exams before attempting the professional. They will give you a good idea of where your level of knowledge is and experience with the AWS certification process.
Plus, passing one exam gives you a 50% discount on your next exam so the passing of an associate exam ($150 fee) will entitle you to a 50% discount on the $300 fee for a professional exam. Therefore you'll have 2 exams for the price of one.
All certifications need to be re-certified every 3 years by re-sitting the exam. Achieving the professional cert after the associate will automatically renew your associate for another 3 years. This is what spurred me to attempt the SA professional cert. My SA associate cert was due to be renewed and I knew that by passing the professional, both would be safe for another 3 years.

Getting Started

AWS provides good resources to get you started on your certification journey. You should start here where you will find the official study guide, sample questions, links to white-papers and a link to their free Exam Readiness Course.

Exam Readiness course

This course provides great detail on how and what to study broken by each domain. The exam questions are broken out by the 5 domains below.

Domain % of Exam
1.0 Design for Organizational Complexity 12.5%
2.0 Design for New Solutions 31%
3.0 Migration Planning 15%
4.0 Cost Control 12.5%
5.0 Continuous Improvement for Existing Solutions 29%

I recommend to start with the AWS resources first as they are the ones setting the exam and the resources above give their perspective on the exam. There are sample questions on the certification page and through the exam readiness course. How you rate yourself against these questions can give you a good indication of your readiness for the exam.
At this stage, you'll probably need another resource to help you prepare for the exam. A Cloud Guru and Cloud Academy provide good courses to help you prepare. I went for Stephane Marek's course on Udemy which I found was up to date and engaging.

Domain Breakdown

As stated there are 5 domains that you are assessed upon in the exam. The official study guide lists 64 tools and technologies that could appear on the exam. I found it difficult to get a mapping of tools and technologies to each domain and the remainder is my attempt to break them down to each domain. Where possible, I will link to the official AWS service page or equivalent.

Domain 1.0 - Design for Organizational Complexity

Need to know;

Hybrid Networks

You do need to know lower level details about how to set up a hybrid network between on-premise and cloud networks. AWS calls this out specifically on the exam homepage and they mean it.

Ability to design a hybrid architecture using key AWS technologies (e.g., VPN, AWS Direct Connect)

This whitepaper is helpful to understand the different options for VPC connectivity.

AWS Organizations

You need to understand AWS Organizations and how you can use all the associated services and feature to provide a solution that a customer can use to manage multi accounts. Bear in mind that AWS recommends a multi-account approach and you need to understand this. You will need to understand organizational units (OUs) and service control policies (SCPs). I found this article helpful in understanding how AWS thinks about security in the context of an Organization, OUs and SCPs. One helpful tip to remember is that an SCP does not apply at the node of the Organization it is applied, rather it is applied to all the child accounts.
In addition you should know how existing AWS services like CloudTrail, Backup, Resource Manager, GuardDuty, Cost Explorer, CloudFormation StackSets, Config, Service Catalog, Compute Optimizer, License Manager work in conjunction with Organizations.
One tip to remember is the differences between CloudFormation Stacksets and Service Catalog. Both help to maintain a consistent infrastructure across all accounts but Service Catalog has the ability to potentially do this in a more secure way with launch constraints. These enable a user to launch a stack in an account without having the enhanced permissions that would be needed to apply a CloudFormation stack.

IAM

You must understand how IAM evaluates if a user has authorisation to perform an action. This article was very helpful for me, specifically this diagram.

Image description

You should understand the difference between Identity-based policies and Resource-based policies.

'You can control access to resources using an identity-based policy or a resource-based policy. In an identity-based policy, you attach the policy to an identity and specify what resources that identity can access. In a resource-based policy, you attach a policy to the resource that you want to control. In the policy, you specify which principals can access that resource. '

For example, S3 bucket policies lets a user have access to a bucket outside of their IAM role.

And how they work in conjunction with IAM permissions boundaries, AWS Organizations service control policies (SCPs) and Session policies.

User federation (saml 2.0 or openid connection)

  • federated users are users (or applications) who do not have AWS accounts. With roles, you can give federated users access to your AWS resources for a limited amount of time. You must understand how to enable cross-account authentication and access strategies. This article should help.

Image description

When working with cross-account access, remember that an IAM user can only assume one role at a time. As soon as a user assumes a role in another account, they loose the access that their current role provides them.
Resource based access controls allow a user to have access outside their role but without assuming another role.

Directory Services

Options for working with AD in AWS are also worth studying under this domain. You should understand the difference between each of these and their appropriate use cases.

  • Simple AD - provides low-scale, low-cost basic AD capability. It is the simplest way to get an AD experience on AWS but it does not connect with other AD domains.
  • AD Connector - enables on-premises users to access AWS services via AD.
  • Managed Microsoft AD - Enables use of managed AD in the AWS cloud.

Storage Gateway

  • File Gateway enables the storage and retrieval of objects in S3 and Glacier using file protocols such as NFS. Configure file shares that are mapped to selected S3 buckets using IAM roles.
  • Tape Gateway provides backup application with an iSCSI VTL interface consisting of a virtual media changer, virtual tape drives, and virtual tapes.
    • virtual tape data is stored in Amazon S3 or can be archived to S3 Glacier.
    • Monitor the status of data transfer and storage interfaces through the aws management console.
    • additionally, use the API or SDK to programmatically manage an application's interaction with the gateway.
  • Volume Gateway
    • Stored-Volume Gateway: data written to a stored volume gateway is saved on on-premises storage hardware and asynchronously backed up to a S3 in the form of EBS snapshots. Storage volumes can be created up to 16 TB in size and are mounted as iSCSI devices from on-premises application servers.
    • Cached-Volume Gateway: with the cached volume gateway, you can create storage volumes up to 32 TB in size and mount them as iSCSI devices from on-premises application servers.
    • data written to these volumes is stored in S3, with only a cache of recently written and recently read data stored locally on on-premises storage hardware.
    • point-in-time snapshots can be taken of volume data in S3 in the form of EBS snapshots. This provides space-efficient versioned copies of volumes for data protection and various data reuse needs.
    • to prepare for upload to S3, a gateway stores incoming data in a staging area called an upload buffer. Know what these are, the differences and the value of each.

VPC Endpoints

Virtual devices that enable instances using private IPs to connect to services without an internet or virtual gateway.

  • An interface VPC endpoint is an elastic network interface with a private IP address that serves as an entry point for traffic destined to services powered by AWS PrivateLink.
  • A gateway endpoint is a gateway that is a target for a specified route in your route table. This type of endpoint is used for traffic destined to a supported AWS service, such as Amazon S3 or Amazon DynamoDB. Access to VPC endpoints is managed via IAM Policies.

Domain 2.0 - Design for New Solutions

This domain constitutes the largest part of the exam at 31% so it well worth digging into. This is the domain where the vast bulk of those 64 services listed become more relevant.

Need to know

Understand how these services can interact with each other

  • ELB and Auto Scaling
  • CloudFront and ELB
  • Route 53 and routing options
  • SQS (assume standard over fifo unless specified)
    • Asynchronous tasks
    • Single direction only
    • Unordered
    • "At least once" delivery
  • SNS
    • Fan out to SQS
    • Asynchronous
    • Batch Processing
  • Data Streams
    • Asynchronous tasks
    • Single direction only
    • ordered within a shard
    • "at least once" semantics
    • independent stream position Know the difference between Kinesis Data Streams vs Kinesis Data Firehose.
  • Data streams has
    • Customer processing per incoming record
    • Sub-1 second processing latency
    • choice of stream processing frameworks
  • Firehose has
    • Zero administration
    • Processing latency of 60 seconds or higher
    • Ability to use existing analytics tools based on S3, Redshift and Elasticsearch Service (Amazon ES).
  • Using SQS for responses
  • Data scaling using S3
    • Put static content in S3
    • Randomize key names
    • Use appropriate storage classes
    • Larger objects results in fewer Get and Put operations
  • Data scaling using CloudFront
    • Reduce traffic costs
    • Increase performance
    • Origin can be from AWS or from an on-premises data center
    • Access to S3 buckets can be restricted to Origin Access Identities
  • Data scaling with EBS and instance stores
    • EBS volume size can be increased while attached to an EC2 instance
    • EBS volume type and throughput can be changed while attached to an instance
    • EC2 instances have a maximum EBS throughput rate
    • Consider OS-based RAID sets
  • Scaling and RDS
    • Two options, increase instance size or increase storage
    • improves read performance only
    • asynchronous
    • unique/different endpoints
    • cross-region
    • CloudWatch metric: ReplicaLag
    • (watch out for differences between RDS flavours)
  • ElastiCache: Redis and Memcached
    • ElastiCache for Redis
      • Advanced data structures
      • Persistent
      • Automatic failover with Multi-AZ deployments
      • Can scale using read replicas
      • Can scale up, but not out. Once scaled up, cannot scale down.
      • Supports backup and restore operations
      • AOF (Append Only File) log can be enabled for recovery of nodes.
    • ElastiCache for Memcached
      • Simple key-value storage
      • Non-persistent, pure cache
      • Can scale both up and out
      • Scales out using multiple nodes
      • Does not support backup and restore operations
      • Supports multi-threaded operations
    • Cache common requests
      • Applications read from and write to the cache
      • Create appropriate cache timeouts
      • Redis replication groups (read replicas)
  • DynamoDB
    • NoSQL database great for unstructured data
    • Don't need the same level of DBA oversight
    • Caching for Dynamodb
      • Elasticache
      • DynamoDB Accelerator (DAX) Read-Through Cache
      • DynamoDB Accelerator (DAX) Write-Through Cache
    • Throughput
      • Using SQS with a queue draining application to throttle writes to DynamoDB if application can handle latency
    • DynamoDB and AWS Auto Scaling
      • Use a CloudWatch alarm to alert when it's time to scale
    • DynamoDB Global Tables
      • Fully managed replication
      • Globally distributed
      • Low latency reads/writes
      • Multi-region redundancy

Components of loosely coupled architectures

  • ELB
    • Two-way traffic
    • Immediate request handling
  • SQS
    • Clients poll SQS
    • Persistent task storage
    • Controlled completion mechanism
  • SNS
    • SNS pushes to subscribers
    • Bulk notification
    • Mobile push capability
  • Kinesis
    • Scalable event streaming
    • Clients read and track stream position
  • Auto Scaling
    • Scalable resources
    • Manage cost

CloudFront

  • What is a CloudFront behaviour?

Identity and access controls

  • IAM Users and Groups
  • STS - services interacting with the account
  • Policies and Roles

AWS Service Roles

AWS services interacting with the account

  • AWS Lambda
  • Amazon EC2

Identity Providers

SAML 2.0, single sign-on, OpenID Connect

  • Amazon Cognito
  • AWS Directory Service

Security and compliance controls

Assuming a role

  • Cross-account
  • in an account

Security logging

  • AWS Config
  • AWS Cloudtrail
  • Segregated bucket
  • Dedicated account
  • Understand how to centralise logging into a single account or S3 bucket in a separate account (Understand CloudTrail log file integrity)

Amazon Cognito

A fully managed solution providing access control and authentication for web/mobile apps.

  • Supports MFA
  • Data at-rest and in-transit encryption
  • Log in via social identity providers
  • Support for SAML

  • User Pools

    • Provides a directory profile for all users which you can access through an SDK.
    • Supports user federation through a third-party identity provider.
    • Signed users receive authentication tokens.
    • Tokens can be exchanged for AWS access via Amazon Cognito identity pools.
  • Identity Pools

    • Authenticates users with web identity providers, including Amazon Cognito user pools.
    • Assigns temporary AWS credentials via AWS STS.
    • Supports anonymous guest users.

Understand the difference between User Pools and Identity Pools and how they can work together.

Image description

Deployment strategies for business requirements

Image description

  • Runtime/container
    • Amazon ECS deploys Docker containers and provides container management and scheduling.
  • Application deployment
    • AWS CodeDeploy handles deployment of application artifacts to target systems. It can deploy to both Amazon EC2 instances and external systems. CodeDeploy can store multiple application versions and has powerful, customizable logic to control deployment behavior.
  • Code/deployment management
    • AWS CodeCommit is a managed Git code repository; the service can store multiple versions of code and deployment artifacts. CodeCommit doesn't compile or deploy code. It relies on other services or system to do this.
  • Infrastructure deployment
    • AWS CloudFormation deploys environments based on a template. AWS CloudFormation doesn't have the ongoing configuration management capabilities of OpsWorks. AWS CloudFormation supports most or all AWS services for deployment.

You'll also need to know when OpsWorks or Elastic BeanStalk are the better options for deploying your infrastructure or application.

Development, testing, and staging environments

Know how to setup different environments. You will have different requirements for availability, performance and cost depending on the environment type. RDS can be interesting area for a question here. AWS provides different RDS templates for

  • Availability
    • Typically lower requirements
    • May still need HA
  • Performance
    • Smoke testing
    • Load testing
  • Similarity
    • Deployment process
  • Cost

Domain 3.0 - Migration Planning

Existing workloads and processes for potential migration to the cloud

Need to know

6 Rs

  • Retain - Leave it alone and revisit it in the future.
  • Re-host - Lift and shift
  • Refactor - Architect applications to be cloud native.
  • Re-platform - Lift, modify and shift
  • Replace - Buy/purchase solutions that already exist in the cloud
  • Retire - Evaluate if an application/system provides value. See this blog post for more detail.

Migration tools or services for new and migrated solutions based on detailed AWS knowledge

Strategies for migrating existing on-premises workloads to the cloud

New cloud architectures for existing solutions

Application Migration process

Plan, build and run

Tools for migration assistance

  • AWS Application Discovery Service
  • AWS Database Migration Service

The AWS storage portfolio

There is no single storage solution that solves every problem.
Image description

Data migration

Consider downtime and orchestration
Methods

  • Image backup/restore
  • File copy
  • Replication

Hybrid networks

Image description

Cost

Cloud can be a variable cost as opposed to on-premise which is generally a fixed cost. Cloud is pay as you go whereas on-premise can generally take the form of a large upfront investment.

Be mindful how you can use reserved, on-demand and spot instances to give the most cost-efficient solution.

Domain 4.0 - Cost Control

"Paying for what you think you need to paying for what you actually need."
Be careful on over-provisioning.

Need to know

Types of Tags

  • Resource Tags

    • Provide the ability to organise and search within and across resources
    • Filterable and searchable
    • Do not appear in detailed billing report
  • Cost Allocation Tags

    • Map AWS charges to organizational attributes for accounting purposes
    • Information presented in the detailed billing report and Cost Explorer (must be explicitly selected)
    • Only available on certain services or limited to components within a service (for example, S3 bucket but not objects)

Best practises of cost management

  • Only allow specific groups or teams to deploy chosen AWS resources.
  • Create policies for each environment.
  • Require tags in order to instantiate resources.
  • Monitor and send alerts or shut down instances that are improperly tagged.
  • Use CloudWatch to send alerts when billing thresholds are ment.
  • Analyze spend using AWS or partner tools.

Domain 5.0 - Continuous Improvement for Existing Solutions

Need to know

Troubleshooting solution architectures

  • Amazon S3 Server Access Logs

    • Description: Contains details about data requests, such as the request type, the resources requested, and the date and time the request was made.
    • When to use: Troubleshoot bucket access issues and data requests.
  • Amazon ELB Access Logs

    • Description: Capture detailed information about each request sent to your load balancer, including client's IP address, latencies, and server responses.
    • When to use: Analyze traffic patterns and troubleshoot network issues.
  • Amazon CloudTrail

    • Description: Provides a history of API calls to your account made via the AWS Management Console, AWS CLI, AWS SDKs, or other AWS services.
    • When to use: Audit and determine who did what, when, and from where.
  • Amazon VPC Flow Logs

    • Description: Capture information about the IP traffic going into or out of your network interfaces and subnets.
    • When to use: Verify network access rules are properly configured and troubleshoot connectivity and security issues.
  • Amazon CloudWatch Logs

    • Description: Monitor, store, and access applications and systems using log data from Amazon EC2 instances and on-premise servers.
    • When to use: Monitor and troubleshoot OS and applications running in your AWS environment.
  • AWS Config

    • Description: Provides an inventory of your AWS resources and records changes to the configuration of those resources.
    • When to use: Troubleshoot outages and conduct security attack analyses.

Determining a strategy to improve an existing solution for operational excellence

  • Well-Architected Framework

    • Understand business and customer needs
    • Make frequent, small, and reversible changes
    • Create and use procedures to respond to operational events
    • Continuously improve supporting processes and procedures
  • AWS Trusted Advisor

    • Define Operational Priorities
  • AWS CloudFormation

    • Design for Operations
  • AWS Systems Manager

    • Operational Readiness
  • AWS CloudWatch

    • Operational Health
  • AWS Lambda

    • Event Response
  • Amazon Elasticsearch

    • Use analytics to learn from experience
  • AWS CodeCommit

    • Share learnings with libraries, scripts, and documentation

Determining a strategy to improve the reliability of an existing solution

Architect for High Availability

  • Appropriate level of availability
    • Availability levels are met
    • Minimize cost and complexity
    • Auto Scaling groups
    • Instance auto recovery
    • Route 53 resource record sets
    • Amazon RDS Multi-AZ
    • Amazon EBS snapshots
    • Amazon EFS
    • Replicated ElastiCache Redis
    • Automate recovery steps
    • Understand impact of a loss at peak load
    • Beware of capacity constraints
  • Best practises
    • Use Multi-AZ services (S3, DDB, SQS)
    • Similar to multiple component failure, but plan for capacity constraints
    • Use Reserved Instances for critical systems
    • Identify all Availability Zone-specific services, noting which are regional/global
    • Amazon EBS snapshots help minimize the Recovery Point Objective

Determine a strategy to improve the Performance of an existing solution

  • Shorten response times
  • Increase throughput
  • Lower the utilization of resources (efficiency)
  • Scalability for workloads that burst
  • Amazon S3 performance
    • Move static content to S3 buckets
    • Use IA for infrequently accessed data
    • Larger objects reduce PUT/GET requests
  • Amazon EBS performance considerations
    • GP2 for system disks and SC1 for cold storage
    • PIOPS (Provisioned IOPS) for high performance random I/O
      • ST1 for high performance sequential I/O
  • Amazon RDS performance considerations
    • Scale up instance size
    • Increase storage size online
    • Read replicas are
      • asynchronous
      • application must direct queries
      • cross-region (for some RDS flavours)
  • Amazon Elasticache performance considerations
    • App reads and writes from cache
    • Cache timeouts/TTL
    • Redis replication groups for availability
    • Write-through for write spikes
    • Memcached is single AZ and does not support encryption at rest
  • Amazon DynamoDB performance considerations
    • Alter read/write capacity units
    • Global or local secondary indexes
    • Use SQS to
      • handle write spikes
      • write data in quiet periods. Must understand the data.

Determine a strategy to improve the Security of an existing solution

Restrict access to resources (least privilege)

  • User-based policies
    • What does a particular entity have access to?
    • Attached to an IAM user
  • Resource-based policies
    • Who has access to a particular resource?
    • Grant access directly on the resource
    • Not all services support resource-based
  • Policy conditions for more control
    • Specify the conditions for when a policy is in effect
    • Dates or IP addresses are examples for further restricting user access with conditions
    • MFA can be enforced via policy conditions

Data encryption

Image description
TDE (Transparent Data Encryption) is only available for some flavours of RDS.

Protect data in-transit

  • SSL termination at the load balancer
    • Certificates are stored in IAM
    • Single certificate per load balancer
    • Offload decryption work to the load balancer
    • Re-encryption between load balancer and instances
    • Application load balancer and classic load balancer
  • SSL termination in CloudFront
    • SNI or non-SNI certificates (Server Name Indication (SNI) allows the server to safely host multiple TLS Certificates for multiple sites, all under a single IP address.)
    • SSL connections to load balancer
  • AWS Certificate Manager
    • Manages and deploys public/private certificates
    • Establish website identity
    • Verify identity of resources within a company

Improve network traffic security

  • Network perimeter controls
    • Security groups
      • Per-ENI granularity. Can have multiple ENIs attached to an instance with a separate SG for each.
      • Stateful = simpler to apply rules
      • Inter-service communication
      • Deny is not a part of security groups
    • Network ACLs
      • Subnet boundaries only
      • ALLOW and DENY rules
      • IP ranges only
    • Host firewalls
      • Central or distributed control
      • Intrusion detection systems (IDS) and intrusion prevention systems (IPS)

Determine a strategy to improve the Deployment of an existing solution

Understand the differences between these differences, where they are used and why you would use one instead of another service to deploy a solution.

  • AWS CloudFormation
  • AWS Code Deploy
  • AWS Elastic Beanstalk
  • AWS OpsWorks
  • Amazon ECS

Additional Resources

Whitepaper: AWS Security Best Practices
AWS Well-Architected Framework
Whitepaper: Practicing Continuous Integration and Continuous Delivery on AWS: Accelerating Software Delivery with DevOps
Whitepaper: Microservices on AWS
Best Practices for Security, Identity, & Compliance
AWS Documentation
AWS Architecture Center

Top comments (5)

Collapse
pabloin profile image
Pablo Ezequiel Inchausti

Congrats! and very nice summary!

Collapse
tom_millner profile image
Tom Milner Author

Thanks Pablo, glad you liked it

Collapse
kasukur profile image
Sri

Very well written and detailed, thank you
I was wondering if you could also post the links to the skill builder courses that you have used to prepare (screenshots in this blog)

Collapse
tom_millner profile image
Tom Milner Author

Hi Sri, thanks for the positive feedback and the question.

All screenshots are either from AWS Exam Readiness course or from articles linked above. I didn't use any other AWS courses to prepare for exam.

explore.skillbuilder.aws/learn/cou...

Collapse
sahidali profile image
Sahid Ali

Nice written and detailed, thanks for sharing with us. If you want to boost your skills, you can join online and offline CETPA Infotech training institute. CETAP provides AWS Training in Noida with certification and placement.

🌚 Friends don't let friends browse without dark mode.

Sorry, it's true.