The continuation of the The AWS Well-Architected Framework and it's 5 Pillars Series.
For the others who haven't yet read the initial part of this series, please read the previous parts above or below this article.
Let's continue on diving deep and this time let's venture the Reliability realm to gain knowledge on what are the key areas, what questions you must conquer and what are the key phases involve in this pillar for us to satisfy it's standards.
... the ability of a system to recover from infrastructure or service disruptions, dynamically acquire computing resources to meet demand, and mitigate disruptions...
Simply saying, will your system work consistently and recover quickly?
To satisfy this pillar, your architecture needs to manage demand and disruptions efficiently by leveraging these key areas:
- Recover from issues automatically
- Scale horizontally first for resiliency
- Reduce idle resources
- Manage change through automation
Understand default and requested limits:
- Are you planning beyond current limits for a resource?
- Will you scale past specific resource limits?
- Can those limits be lifted?
- Can you plan around those limits?
We need to understand latency, topology, bandwidth:
- IP address space management
- Subnet structures
- Resilient topologies
- Ability to handle sudden increased in traffic
- Provide consistent performance regardless
Ensure your application is ready for business use:
- Can users access your application?
- Deploy without issue
- Can you push issue to planned downtime?
- Can your application withstand partial outages?
The key takeaways in this pillar is to build workloads that meets business expectation, regardless of anything because you can build the finest application in the world but if its not accessible, reliable and available it will still not produce value to you or for your business.
Follow or Connect with me: