Architecting in AWS is not just about stringing together a few services; it's about building scalable, resilient, and efficient systems that serve a purpose for the business. In that sense, it's an activity that combines the technical knowledge of how to implement a system with the business knowledge of how cloud systems can support business objectives. With that in mind, we can analyze architecture from two different perspectives.
Architecture from a technical perspective involves functional and non-functional requirements, such as scalability, availability and resiliency. It must also serve the technical people who will be implementing it, through aspects such as maintainability, simplicity and developer experience.
Architecture isn't about implementation details, but about the decisions that limit and constraint those details. For example, using EC2 instances is an architecture decision, but what specific family or size of EC2 instance is an implementation detail.
Overall, these are the key characteristics of a good architecture in AWS, from the technical perspective.
Fine-grained performance tuning might be an implementation detail, but the overall architecture can either enable or severely restrict performance. Selecting the right compute or storage layer, using load balancers or implementing caches can make or break an application's performance. All in all, good performance can be achieved through a deep understanding of both the workload requirements and how AWS services work in conjunction. This allows an architect to identify bottlenecks before even implementing the cloud solution, and optimizing performance as necessary.
Resilience to failures, high availability and fault tolerance depend entirely on architecture decisions. Implementing Multi-AZ deployments, setting up disaster recovery strategies, and understanding how different services guarantee availability ensures that the architecture is resilient. The desired level of resilience is usually not a technical decision, but one of costs and business continuity. However, the implementation of that level of availability and resilience lies entirely on the architect.
Properly architected solutions will implement security at every layer, using VPC, Security Groups, IAM, encryption methods and different security services to protect the application from malicious users. Effective protection can only be achieved by specifically architecting for it.
One of the biggest benefits of the cloud is being flexible. As demand increases you can scale out the infrastructure to meet the peaks, and when it fluctuates your solution can scale in to reduce costs. The compute layer can be made scalable by setting up Auto Scaling Groups and elastic load balancing, or with a containerized or purely serverless approach. However, a truly scalable architecture needs to consider every service involved, including the data stores and network services.
Every solution has a limit to which it can scale, depending on the AWS services used. It's the architect's job to understand the limits of their architecture, whether they're enough for the expected workload, and what to do if they aren't.
A cloud architecture isn't a thought exercise. It's a set of decisions that will serve as a blueprint to implement a cloud solution. When architecting in AWS, you need to consider the difficulty of implementing that solution, and the effort of maintaining it. These considerations are so important that they need to be part of the architecture, and they will constraint your architecture decisions. Simplicity is preferred over elegance, maintainability over performance micro-optimizations, and developer experience over complexity or trying to look smart.
As I mentioned before, architecture isn't just about technical decisions. Those decisions need to serve the business, and business priorities will both constraint and prioritize the technical aspects of the architecture in AWS.
These are the key business characteristics that need to be part of the architecture in AWS.
The architecture must align with business goals, translating organizational objectives into technical strategies. Whether it's agility, cost reduction, or global expansion, the architecture must reflect and support these goals. Remember that the solution you're architecting only exists to support the business.
A good AWS architecture is vital in ensuring the scalability of applications and services to meet the scale of the business. Since architecture is the set of decisions that are hard to change, the architecture in AWS needs to be designed for the business goals for scalability, and not just for the current workload. Furthermore, scalability of operations depends on the simplicity and developer experience of the solution, which come from the architecture decisions.
Architectural decisions directly impact cost. From the choice of EC2 vs Fargate to the storage type, these decisions define the cost structure. Cost optimization is a continuous process that starts with the architecture, and continues throughout the entire lifecycle of the cloud solution.
The architecture must align with legal and regulatory requirements, such as GDPR, HIPAA, and others. These requirements are still another constraint to the architecture decisions, and are nearly always non-negotiable. The architecture needs to support these requirements.
Master AWS with Real Solutions and Best Practices. Subscribe to the free newsletter Simple AWS. 3000 engineers and tech experts already have.
A typical 3-tier architecture consists of Web, Application, and Database layers, each playing a different role in the entire solution.
The web tier handles user interactions and serves website content. A typical pattern for a static website is to serve it from S3, with CloudFront as the CDN and Route 53 for DNS. Dynamic websites that render server-side depending on user input need a compute layer, such as an Auto Scaling Group of EC2 instances behind an Elastic Load Balancer, or an ECS cluster running on EC2 instances or Fargate. Even in those cases, using CloudFront as a CDN is nearly always recommended.
This tier handles business logic and dynamically processes requests. It's typically deployed in an Auto Scaling Group of EC2 instances with a Load Balancer, in an ECS or EKS cluster for containerized applications, or in AWS Lambda for purely serverless applications. It includes integrations with other services such as SQS or SNS to inter-communicate different modules, KMS to encrypt data, and services that aid in security and/or management, such as Secrets Manager or Systems Manager.
The database tier provides persistent storage for the application. Most of the time this comes in the form of a structured database such as DynamoDB, or RDS for MySQL or Postgres. However, the storage tier can also include caches with ElastiCache for Redis or for Memcached, block storage such as EBS, or file storage such as EFS. Understanding the different types of storage and how AWS services offer them and price them is key to designing a good database tier.
Serverless means shifting to AWS the responsibility of managing underlying servers (even virtual ones), and paying for actual usage of resources instead of reserving capacity. For engineers who don't want to manage servers, that sounds like a fantastic promise. However, architecting a serverless solution needs a lot more than just shifting the responsibility of servers. You need to understand the basic building blocks of a serverless architecture, and how to combine them to build a serverless application.
No Server Management: The underlying compute layer is abstracted, removing the need for server provisioning and maintenance.
Automatic Scaling: Resources scale automatically with the number of executions.
Cost-Efficiency: Pay only for the compute time consumed; there's no charge when your code isn't running.
Lambda is the core compute service for serverless architectures. In Lambda you create functions, where each function has a part of the code (typically a service) that runs separately from the rest. When Lambda receives a request, it initiates a new invocation of the function, which may create a new execution environment or reuse an existing one. You can set the memory and CPU for the function, and you're only billed for the time that the invocation is running, from receiving the request to returning.
Lambda functions abstract away a lot of the responsibility of running code in production, but they impose certain limitations. An invocation cannot run for more than 15 minutes, so long-running processes can't be implemented with AWS Lambda. Additionally, since you can't guarantee that an invocation will reuse an existing execution environment or create a new one, Lambda functions need to be stateless. Thirdly, Lambda functions end up coupling infrastructure decisions with the code, so developers will need to understand how they work.
Lambda functions can be triggered by many events, not just HTTP requests. They can be used to respond to changes in a DynamoDB table, to alarms from CloudWatch metrics, to operations on S3 buckets, and even to misconfigurations or security events. In this sense, they're more than just a compute layer for a web application, but instead they become a fantastic tool to automate various infrastructure tasks.
Services like DynamoDB, S3, CloudWatch, SNS, SQS and Kinesis are tightly integrated with Lambda, enabling you to create complex serverless, event-driven workflows. Events in some parts of the system such as a DynamoDB table trigger behavior that's implemented with a Lambda function. Additionally, these Lambda functions can interact with other AWS services, triggering new events that, in turn, trigger other behaviors. Use these integrations to create event-driven workflows that are entirely serverless and greatly scalable.
API Gateway is an AWS service that acts as a front door for applications, presenting a single endpoint that allows consumers to access data, business logic, or functionality from back-end services. The benefit of API Gateway is that it decouples the endpoint from the implementation, allowing you to replace a serverful application with a serverless one without modifying the endpoint that's exposed.
DynamoDB is a high performance NoSQL database service that works perfectly within a serverless architecture. It's serverless itself, meaning that you don't need to worry about instances or availability zones. Moreover, it scales automatically, so your scaling Lambda functions won't need to be throttled to avoid overloading the database. Being a NoSQL database means you'll need to keep in mind a few additional considerations, such as DynamoDB Database Design.
Step Functions allows you to create state machines to coordinate components of distributed applications and microservices. You can build your state machines using visual workflows or through Infrastructure as Code, and create complex, multi-step workflow with AWS Step Functions.
High Availability in AWS means that your application can continue to function with only a minor interruption in the event that an Availability Zone fails. To architect a multi-tier solution for high availability, we need to consider how each component of the architecture is deployed in availability zones, and how an AZ failure will impact them.
By using services across multiple Availability Zones, you ensure that a failure in one zone doesn't bring down the entire system. This means not only deploying EC2 instances across several availability zones, but also ensuring data is replicated in more than one availability zone, and that failover happens automatically.
Multiple instances will naturally have multiple endpoints (public IP addresses), which may change if some instances stop responding. An Elastic Load Balancer exposes a single endpoint for the entire compute layer, and distributes requests across backend targets. It dynamically registers and de-registers EC2 instances or other backend targets according to their response to a health check request, ensuring only active instances will receive requests.
Auto Scaling Groups (ASGs) can launch or terminate EC2 instances based on metrics such as average CPU usage. This way, your compute capacity can scale out when traffic increases, and scale in when traffic is low. Additionally, an Auto Scaling Group will re-create an instance if it fails. This is especially important in Multi-AZ architectures, since in the event of an AZ failure the Auto Scaling Group can re-create all the failed instances in another availability zone. Implementing Auto Scaling Groups correctly requires understanding scaling policies, cooldown periods, and lifecycle hooks. They're a key component in ensuring the compute layer can recover from failures and can scale without manual intervention.
For an architecture to be highly available, data needs to be replicated and accessible across several availability zones. Some services like DynamoDB automatically guarantee this. Others, such as RDS or Aurora, need the creation of a replica. Fortunately, for these types of services AWS offers the feature of automatic failover, where requests are sent to a single endpoint and in the event of a failure of the main instance the failover instance automatically starts handling the traffic.
On top of the failure of an Availability Zone, AWS can also experience the failure of an entire Region. Building an architecture that's resilient to regional failures presents much more complex challenges than a Multi-AZ architecture.
Most services that offer replication in multiple availability zones only do so within the same region. Some have features to replicate across regions, like S3 cross-region replication, but they will charge you for that. Additionally, failover across regions doesn't happen automatically, and more often than not it needs code changes to update the endpoint from which data is consumed.
The vast majority of AWS services are region-scoped, meaning their configurations are specific for that region. This includes features like EBS snapshots, EC2 AMIs, RDS backups, and services such as Secrets Manager or VPC. If a region fails, all of these services will stop functioning in that region. That means a Multi-region architecture needs to replicate all of these configurations across regions, and deal with potential inconsistencies across these copies.
Storing and processing data within specific legal jurisdictions may be necessary to comply with local laws. Multi-region architectures may come into conflict with these requirements. If there is only one region that complies with your legal requirements, you may need to consider either not replicating that part of the system, or architecting the entire system for only one region and accepting the risk of a region failing.
A multi-region approach serves as a disaster recovery strategy. It involves knowing how to route traffic between regions, synchronize data, and ensure that applications can failover smoothly between regions.
Resilient architectures are about preparing for the unexpected, making sure that the application can keep functioning in the event of partial or localized failures, and that it can recover automatically from general failures. These are some of the best practices that will help you design a resilient architecture in AWS.
Dividing the architecture into fault isolation zones ensures that failures are contained and do not cascade throughout the system. These zones can consist of different services, or simply different instances of the same service. For example, having separate RDS instances for different databases can prevent the failure of a single instance from bringing down the entire system.
Understanding how to spread resources across Availability Zones and Regions adds a layer of redundancy and resilience to the architecture. However, Multi-AZ and Multi-region doesn't come without a cost. You need to be aware of the tradeoffs of increased cost and increased difficulty to maintain, and decide whether the benefits are worth the price.
It's not enough to have a recovery plan on paper. Regularly testing recovery procedures using tools like AWS Fault Injection Simulator ensures that the system can handle real-world failure scenarios.
Using services like AWS Auto Scaling, CloudWatch Alarms, and AWS Step Functions to automate recovery mechanisms ensures that the system can react to failures without human intervention. This is key in resilient architectures, since humans are much slower to respond, much more expensive to keep on watch, and more prone to errors that could further complicate an outage.
The AWS Well-Architected Framework is a set of guidelines that provide a consistent approach for customers and partners to evaluate architectures and implement scalable and resilient systems. It's divided into five pillars:
Focuses on running and monitoring systems to deliver business value and continually improving processes and procedures. It greatly impacts maintainability and developer experience.
Emphasizes protecting information and systems, managing access, and implementing guardrails. It involves considering security as an architectural concern that needs to be present from the architecture design and throughout the entire lifecycle of the application.
Considers the system's ability to recover from failures and dynamically adapt to meet changing demand. It emphasizes resilience, availability, fault tolerance and disaster recovery.
Concentrates on using resources efficiently, selecting the right types and sizes based on workload requirements, and handling the expected traffic with the appropriate amount and configuration of resources. It complements Cost Optimization, but it's more concerned with the application working correctly.
Focuses on avoiding unnecessary costs, analyzing spending, and meeting business needs in the most cost-efficient way. It involves continuously analyzing and monitoring costs, and applying cost-reduction strategies. It complements Performance Efficiency, but from a costs perspective.
Getting AWS Certified with an architecture certification, either Solutions Architect - Associate or Solutions Architect - Professional, can have a significant impact on your career. They cover the knowledge necessary to architect applications in AWS, both from a general solutions perspective and from specific configurations that test the limits of AWS services. Architecting solutions in AWS requires both general software architecture knowledge and experience, and specific AWS knowledge. The certifications don't test for experience, but the knowledge they cover make them an important and useful badge to hold when looking for a cloud architect role.
Understanding architecture in AWS involves more than knowing the individual services. It requires an understanding of how these components combine and interact to form scalable, resilient, and efficient systems. Whether you're opting for a serverless paradigm or designing across multiple regions, AWS provides the tools and services necessary to build complex architectures tailored to specific needs and objectives. As an AWS Architect, it's your responsibility to understand these tools and their limitations, and know how to use them to architect solutions in AWS that can meet your business requirements.
Master AWS with Real Solutions and Best Practices.
Join over 3000 devs, tech leads, and experts learning real AWS solutions with the Simple AWS newsletter.
Analyze real-world scenarios
Learn the why behind every solution
Get best practices to scale and secure them
Simple AWS is free. Start mastering AWS!