While you're getting in shape for the daily challenges handling productive AWS solutions, these two (confusing?)interesting definitions may pop up in your team discussions, so let's dive a bit into these two topics.
High Availability
High Availability can be defined as the percentage of uptime which maintains operational performance, often aligned to a service's SLA. AWS has many SLAs for its services where they implement their own level of resilience and management to maintain that level of high availability. Find below the following SLA examples:
- S3 Standard
- 99.9%
- EC2
- 99.95%
- RDS
- 99.95%
High Availability - Example Design
- 1: High Availability through the presence of 2 Availability Zones in a single Region
- 2: High Availability through multiple EC2 instances, which guarantee a minimum of available nodes to handle necessary traffic load.
- 3: High Availability achieved through the use of a Load Balancer.
Let's implement this solution through an AWS CloudFormation template!
Note: Consider your AWS Free-tier availability to avoid hidden charges
About CloudFormation:
CloudFormation is a way of defining your AWS Infrastructure as Code. All the necessary resources and their dependencies can be defined as code in a CloudFormation Template (JSON or YAML file), which is then launched as a stack. Some definitions to keep in mind:
Resources : Allow us to define the required AWS resources. Mandatory section.
Parameters : To enter Dynamic inputs to your template. You can customize it based on your specific needs or use cases.
Mappings : To define static variables, following a key:value pair definition.
Outputs : To define the output values that can be referred by another stack through import.
Conditions : Situations under a specific resource can, or cannot, be created.
Without further due, the below CloudFormation template will provide a ELB
---
Parameters:
SecurityGroupDescription:
Description: Security Group Description
Type: String
KeyName:
Description: Key Pair for EC2
Type: 'AWS::EC2::KeyPair::KeyName'
Resources:
EC2Instance1:
Type: AWS::EC2::Instance
Properties:
AvailabilityZone: us-east-1a
ImageId: ami-0233c2d874b811deb
InstanceType: t2.micro
SecurityGroups:
- !Ref EC2SecurityGroup
KeyName: !Ref KeyName
UserData:
Fn::Base64: !Sub |
#!/bin/bash
yum update -y
yum install -y httpd
systemctl start httpd
systemctl enable httpd
#echo "<h1>Hello from Region us-east-1a</h1>" > /var/www/html/index.html
EC2Instance2:
Type: AWS::EC2::Instance
Properties:
AvailabilityZone: us-east-1b
ImageId: ami-0233c2d874b811deb
InstanceType: t2.micro
SecurityGroups:
- !Ref EC2SecurityGroup
KeyName: !Ref KeyName
UserData:
Fn::Base64: !Sub |
#!/bin/bash
yum update -y
yum install -y httpd
systemctl start httpd
systemctl enable httpd
#echo "<h1>Hello from Region us-east-1b</h1>" > /var/www/html/index.html
# security group
ELBSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: ELB Security Group
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 80
ToPort: 80
CidrIp: 0.0.0.0/0
EC2SecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: !Ref SecurityGroupDescription
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 80
ToPort: 80
SourceSecurityGroupId:
Fn::GetAtt:
- ELBSecurityGroup
- GroupId
- IpProtocol: tcp
FromPort: 22
ToPort: 22
CidrIp: 0.0.0.0/0
# Load Balancer for EC2
LoadBalancerforEC2:
Type: AWS::ElasticLoadBalancing::LoadBalancer
Properties:
AvailabilityZones: [us-east-1a, us-east-1b]
Instances:
- !Ref EC2Instance1
- !Ref EC2Instance2
Listeners:
- LoadBalancerPort: '80'
InstancePort: '80'
Protocol: HTTP
HealthCheck:
Target: HTTP:80/
HealthyThreshold: '3'
UnhealthyThreshold: '5'
Interval: '30'
Timeout: '5'
SecurityGroups:
- !GetAtt ELBSecurityGroup.GroupId
Fault Tolerance
Fault Tolerance has the solely goal to expand on High Availability to offer the greatest level of protection, aiming for a zero-downtime solution. This approach will certainly imply additional costs implications, with the upside of a higher uptime percentage and no interruption should 1 or even many components fails at different levels.
Here we can see the following:
1: Regional-redundancy is achieved through the use of AWS Route53 DNS service.
2: Availability-Zone redundancy level can be achieved by ELB, same as HA approach.
3: EC2 compute node is achieved either by multiple EC2 instances or Auto Scaling Groups (ASG).
What about Microservices?
Certainly above definitions apply to long-time existing Web applications, but what about Microservices architectures? what additional layers of HA or FT can we add here?
To give you an example, AWS EKS solution runs and scales Kubernetes control plane across multiple Availability Zones to guarantee HA. Unhealthy control plane instances detection and replacement are among the key feature AWS provides to maintain HA of the control plane during its operation. Along with this resiliency layer, we can use the existing ones we discussed before.
As we did before, let's have a look at a sample CloudFormation template we can use to deploy EKS Control-Plane, including IAM Roles, Network architecture and redundant control plane for EKS Cluster:
AWSTemplateFormatVersion: '2010-09-09'
Parameters:
EKSIAMRoleName:
Type: String
Description: The name of the IAM role for the EKS service to assume.
EKSClusterName:
Type: String
Description: The desired name of your AWS EKS Cluster.
VpcBlock:
Type: String
Default: 192.168.0.0/16
Description: The CIDR range for the VPC. This should be a valid private (RFC 1918) CIDR range.
PublicSubnet01Block:
Type: String
Default: 192.168.0.0/18
Description: CidrBlock for public subnet 01 within the VPC
PublicSubnet02Block:
Type: String
Default: 192.168.64.0/18
Description: CidrBlock for public subnet 02 within the VPC
PrivateSubnet01Block:
Type: String
Default: 192.168.128.0/18
Description: CidrBlock for private subnet 01 within the VPC
PrivateSubnet02Block:
Type: String
Default: 192.168.192.0/18
Description: CidrBlock for private subnet 02 within the VPC
Metadata:
AWS::CloudFormation::Interface:
ParameterGroups:
-
Label:
default: "Worker Network Configuration"
Parameters:
- VpcBlock
- PublicSubnet01Block
- PublicSubnet02Block
- PrivateSubnet01Block
- PrivateSubnet02Block
Resources:
EKSIAMRole:
Type: 'AWS::IAM::Role'
Properties:
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service:
- eks.amazonaws.com
Action:
- 'sts:AssumeRole'
RoleName: !Ref EKSIAMRoleName
ManagedPolicyArns:
- arn:aws:iam::aws:policy/AmazonEKSClusterPolicy
- arn:aws:iam::aws:policy/AmazonEKSServicePolicy
VPC:
Type: AWS::EC2::VPC
Properties:
CidrBlock: !Ref VpcBlock
EnableDnsSupport: true
EnableDnsHostnames: true
Tags:
- Key: Name
Value: !Sub '${AWS::StackName}-VPC'
InternetGateway:
Type: "AWS::EC2::InternetGateway"
VPCGatewayAttachment:
Type: "AWS::EC2::VPCGatewayAttachment"
Properties:
InternetGatewayId: !Ref InternetGateway
VpcId: !Ref VPC
PublicRouteTable:
Type: AWS::EC2::RouteTable
Properties:
VpcId: !Ref VPC
Tags:
- Key: Name
Value: Public Subnets
- Key: Network
Value: Public
PrivateRouteTable01:
Type: AWS::EC2::RouteTable
Properties:
VpcId: !Ref VPC
Tags:
- Key: Name
Value: Private Subnet AZ1
- Key: Network
Value: Private01
PrivateRouteTable02:
Type: AWS::EC2::RouteTable
Properties:
VpcId: !Ref VPC
Tags:
- Key: Name
Value: Private Subnet AZ2
- Key: Network
Value: Private02
PublicRoute:
DependsOn: VPCGatewayAttachment
Type: AWS::EC2::Route
Properties:
RouteTableId: !Ref PublicRouteTable
DestinationCidrBlock: 0.0.0.0/0
GatewayId: !Ref InternetGateway
PrivateRoute01:
DependsOn:
- VPCGatewayAttachment
- NatGateway01
Type: AWS::EC2::Route
Properties:
RouteTableId: !Ref PrivateRouteTable01
DestinationCidrBlock: 0.0.0.0/0
NatGatewayId: !Ref NatGateway01
PrivateRoute02:
DependsOn:
- VPCGatewayAttachment
- NatGateway02
Type: AWS::EC2::Route
Properties:
RouteTableId: !Ref PrivateRouteTable02
DestinationCidrBlock: 0.0.0.0/0
NatGatewayId: !Ref NatGateway02
NatGateway01:
DependsOn:
- NatGatewayEIP1
- PublicSubnet01
- VPCGatewayAttachment
Type: AWS::EC2::NatGateway
Properties:
AllocationId: !GetAtt 'NatGatewayEIP1.AllocationId'
SubnetId: !Ref PublicSubnet01
Tags:
- Key: Name
Value: !Sub '${AWS::StackName}-NatGatewayAZ1'
NatGateway02:
DependsOn:
- NatGatewayEIP2
- PublicSubnet02
- VPCGatewayAttachment
Type: AWS::EC2::NatGateway
Properties:
AllocationId: !GetAtt 'NatGatewayEIP2.AllocationId'
SubnetId: !Ref PublicSubnet02
Tags:
- Key: Name
Value: !Sub '${AWS::StackName}-NatGatewayAZ2'
NatGatewayEIP1:
DependsOn:
- VPCGatewayAttachment
Type: 'AWS::EC2::EIP'
Properties:
Domain: vpc
NatGatewayEIP2:
DependsOn:
- VPCGatewayAttachment
Type: 'AWS::EC2::EIP'
Properties:
Domain: vpc
PublicSubnet01:
Type: AWS::EC2::Subnet
Metadata:
Comment: Subnet 01
Properties:
AvailabilityZone:
Fn::Select:
- '0'
- Fn::GetAZs:
Ref: AWS::Region
CidrBlock:
Ref: PublicSubnet01Block
VpcId:
Ref: VPC
Tags:
- Key: Name
Value: !Sub "${AWS::StackName}-PublicSubnet01"
PublicSubnet02:
Type: AWS::EC2::Subnet
Metadata:
Comment: Subnet 02
Properties:
AvailabilityZone:
Fn::Select:
- '1'
- Fn::GetAZs:
Ref: AWS::Region
CidrBlock:
Ref: PublicSubnet02Block
VpcId:
Ref: VPC
Tags:
- Key: Name
Value: !Sub "${AWS::StackName}-PublicSubnet02"
PrivateSubnet01:
Type: AWS::EC2::Subnet
Metadata:
Comment: Subnet 03
Properties:
AvailabilityZone:
Fn::Select:
- '0'
- Fn::GetAZs:
Ref: AWS::Region
CidrBlock:
Ref: PrivateSubnet01Block
VpcId:
Ref: VPC
Tags:
- Key: Name
Value: !Sub "${AWS::StackName}-PrivateSubnet01"
- Key: "kubernetes.io/role/internal-elb"
Value: 1
PrivateSubnet02:
Type: AWS::EC2::Subnet
Metadata:
Comment: Private Subnet 02
Properties:
AvailabilityZone:
Fn::Select:
- '1'
- Fn::GetAZs:
Ref: AWS::Region
CidrBlock:
Ref: PrivateSubnet02Block
VpcId:
Ref: VPC
Tags:
- Key: Name
Value: !Sub "${AWS::StackName}-PrivateSubnet02"
- Key: "kubernetes.io/role/internal-elb"
Value: 1
PublicSubnet01RouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
SubnetId: !Ref PublicSubnet01
RouteTableId: !Ref PublicRouteTable
PublicSubnet02RouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
SubnetId: !Ref PublicSubnet02
RouteTableId: !Ref PublicRouteTable
PrivateSubnet01RouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
SubnetId: !Ref PrivateSubnet01
RouteTableId: !Ref PrivateRouteTable01
PrivateSubnet02RouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
SubnetId: !Ref PrivateSubnet02
RouteTableId: !Ref PrivateRouteTable02
ControlPlaneSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Cluster communication with worker nodes
VpcId: !Ref VPC
EKSCluster:
Type: AWS::EKS::Cluster
Properties:
Name: !Ref EKSClusterName
RoleArn:
"Fn::GetAtt": ["EKSIAMRole", "Arn"]
ResourcesVpcConfig:
SecurityGroupIds:
- !Ref ControlPlaneSecurityGroup
SubnetIds:
- !Ref PublicSubnet01
- !Ref PublicSubnet02
- !Ref PrivateSubnet01
- !Ref PrivateSubnet02
DependsOn: [EKSIAMRole, PublicSubnet01, PublicSubnet02, PrivateSubnet01, PrivateSubnet02, ControlPlaneSecurityGroup]
Outputs:
SubnetIds:
Description: Subnets IDs in the VPC
Value: !Join [ ",", [ !Ref PublicSubnet01, !Ref PublicSubnet02, !Ref PrivateSubnet01, !Ref PrivateSubnet02 ] ]
SecurityGroups:
Description: Security group for the cluster control plane communication with worker nodes
Value: !Join [ ",", [ !Ref ControlPlaneSecurityGroup ] ]
VpcId:
Description: The VPC Id
Value: !Ref VPC
Final Thoughts
We can conclude that Fault-Tolerant systems are intrinsically Highly available solutions with Zero-time downtime, but as we saw in this article, a Highly available solution is not completely Fault Tolerant. Microservices grant us an extra layer of resiliency, that also involves certain risk and complexity. It's down to us as Solution Architects to define which architecture we want to achieve based on business needs or budget constraints.
References:
Top comments (0)