DEV Community

Cover image for High Availability vs Fault Tolerance in AWS
Marco Gonzalez
Marco Gonzalez

Posted on

High Availability vs Fault Tolerance in AWS

While you're getting in shape for the daily challenges handling productive AWS solutions, these two (confusing?)interesting definitions may pop up in your team discussions, so let's dive a bit into these two topics.

High Availability

High Availability can be defined as the percentage of uptime which maintains operational performance, often aligned to a service's SLA. AWS has many SLAs for its services where they implement their own level of resilience and management to maintain that level of high availability. Find below the following SLA examples:

  1. S3 Standard
    • 99.9%
  2. EC2
    • 99.95%
  3. RDS
    • 99.95%

High Availability - Example Design

High Availability - Example Topology

  • 1: High Availability through the presence of 2 Availability Zones in a single Region
  • 2: High Availability through multiple EC2 instances, which guarantee a minimum of available nodes to handle necessary traffic load.
  • 3: High Availability achieved through the use of a Load Balancer.

Let's implement this solution through an AWS CloudFormation template!
Note: Consider your AWS Free-tier availability to avoid hidden charges

About CloudFormation:

CloudFormation is a way of defining your AWS Infrastructure as Code. All the necessary resources and their dependencies can be defined as code in a CloudFormation Template (JSON or YAML file), which is then launched as a stack. Some definitions to keep in mind:

Resources : Allow us to define the required AWS resources. Mandatory section.

Parameters : To enter Dynamic inputs to your template. You can customize it based on your specific needs or use cases.

Mappings : To define static variables, following a key:value pair definition.

Outputs : To define the output values that can be referred by another stack through import.

Conditions : Situations under a specific resource can, or cannot, be created.

Without further due, the below CloudFormation template will provide a ELB

---
Parameters:
  SecurityGroupDescription:
    Description: Security Group Description
    Type: String
  KeyName:
    Description: Key Pair for EC2
    Type: 'AWS::EC2::KeyPair::KeyName'

Resources:
  EC2Instance1:
    Type: AWS::EC2::Instance
    Properties:
      AvailabilityZone: us-east-1a
      ImageId: ami-0233c2d874b811deb 
      InstanceType: t2.micro
      SecurityGroups:
        - !Ref EC2SecurityGroup
      KeyName: !Ref KeyName
      UserData: 
        Fn::Base64: !Sub |
          #!/bin/bash
          yum update -y
          yum install -y httpd
          systemctl start httpd
          systemctl enable httpd
          #echo "<h1>Hello from Region us-east-1a</h1>" > /var/www/html/index.html

  EC2Instance2:
    Type: AWS::EC2::Instance
    Properties:
      AvailabilityZone: us-east-1b
      ImageId: ami-0233c2d874b811deb 
      InstanceType: t2.micro
      SecurityGroups:
        - !Ref EC2SecurityGroup
      KeyName: !Ref KeyName
      UserData: 
        Fn::Base64: !Sub |
          #!/bin/bash
          yum update -y
          yum install -y httpd
          systemctl start httpd
          systemctl enable httpd
          #echo "<h1>Hello from Region us-east-1b</h1>" > /var/www/html/index.html

  # security group
  ELBSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: ELB Security Group
      SecurityGroupIngress:
      - IpProtocol: tcp
        FromPort: 80
        ToPort: 80
        CidrIp: 0.0.0.0/0

  EC2SecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: !Ref SecurityGroupDescription
      SecurityGroupIngress:
      - IpProtocol: tcp
        FromPort: 80
        ToPort: 80
        SourceSecurityGroupId: 
          Fn::GetAtt:
          - ELBSecurityGroup
          - GroupId
      - IpProtocol: tcp
        FromPort: 22
        ToPort: 22
        CidrIp: 0.0.0.0/0

  # Load Balancer for EC2
  LoadBalancerforEC2:
    Type: AWS::ElasticLoadBalancing::LoadBalancer
    Properties:
      AvailabilityZones: [us-east-1a, us-east-1b]
      Instances:
      - !Ref EC2Instance1
      - !Ref EC2Instance2
      Listeners:
      - LoadBalancerPort: '80'
        InstancePort: '80'
        Protocol: HTTP
      HealthCheck:
        Target: HTTP:80/
        HealthyThreshold: '3'
        UnhealthyThreshold: '5'
        Interval: '30'
        Timeout: '5'
      SecurityGroups:
        - !GetAtt ELBSecurityGroup.GroupId
Enter fullscreen mode Exit fullscreen mode

Fault Tolerance

Fault Tolerance has the solely goal to expand on High Availability to offer the greatest level of protection, aiming for a zero-downtime solution. This approach will certainly imply additional costs implications, with the upside of a higher uptime percentage and no interruption should 1 or even many components fails at different levels.

Multi-Region Topology

Here we can see the following:

1: Regional-redundancy is achieved through the use of AWS Route53 DNS service.
2: Availability-Zone redundancy level can be achieved by ELB, same as HA approach.
3: EC2 compute node is achieved either by multiple EC2 instances or Auto Scaling Groups (ASG).

What about Microservices?

Certainly above definitions apply to long-time existing Web applications, but what about Microservices architectures? what additional layers of HA or FT can we add here?

To give you an example, AWS EKS solution runs and scales Kubernetes control plane across multiple Availability Zones to guarantee HA. Unhealthy control plane instances detection and replacement are among the key feature AWS provides to maintain HA of the control plane during its operation. Along with this resiliency layer, we can use the existing ones we discussed before.

AWS EKS Topology

As we did before, let's have a look at a sample CloudFormation template we can use to deploy EKS Control-Plane, including IAM Roles, Network architecture and redundant control plane for EKS Cluster:

AWSTemplateFormatVersion: '2010-09-09'
Parameters:
  EKSIAMRoleName:
    Type: String
    Description: The name of the IAM role for the EKS service to assume.
  EKSClusterName:
    Type: String
    Description: The desired name of your AWS EKS Cluster.

  VpcBlock:
    Type: String
    Default: 192.168.0.0/16
    Description: The CIDR range for the VPC. This should be a valid private (RFC 1918) CIDR range.
  PublicSubnet01Block:
    Type: String
    Default: 192.168.0.0/18
    Description: CidrBlock for public subnet 01 within the VPC
  PublicSubnet02Block:
    Type: String
    Default: 192.168.64.0/18
    Description: CidrBlock for public subnet 02 within the VPC
  PrivateSubnet01Block:
    Type: String
    Default: 192.168.128.0/18
    Description: CidrBlock for private subnet 01 within the VPC
  PrivateSubnet02Block:
    Type: String
    Default: 192.168.192.0/18
    Description: CidrBlock for private subnet 02 within the VPC
Metadata:
  AWS::CloudFormation::Interface:
  ParameterGroups:
    -
      Label:
        default: "Worker Network Configuration"
      Parameters:
        - VpcBlock
        - PublicSubnet01Block
        - PublicSubnet02Block
        - PrivateSubnet01Block
        - PrivateSubnet02Block
Resources:
  EKSIAMRole:
    Type: 'AWS::IAM::Role'
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
          Principal:
            Service:
              - eks.amazonaws.com
          Action:
            - 'sts:AssumeRole'
      RoleName: !Ref EKSIAMRoleName
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/AmazonEKSClusterPolicy
        - arn:aws:iam::aws:policy/AmazonEKSServicePolicy
  VPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock:  !Ref VpcBlock
      EnableDnsSupport: true
      EnableDnsHostnames: true
    Tags:
    - Key: Name
      Value: !Sub '${AWS::StackName}-VPC'
  InternetGateway:
    Type: "AWS::EC2::InternetGateway"
    VPCGatewayAttachment:
    Type: "AWS::EC2::VPCGatewayAttachment"
    Properties:
      InternetGatewayId: !Ref InternetGateway
      VpcId: !Ref VPC
  PublicRouteTable:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref VPC
    Tags:
    - Key: Name
      Value: Public Subnets
    - Key: Network
      Value: Public

  PrivateRouteTable01:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref VPC
    Tags:
    - Key: Name
      Value: Private Subnet AZ1
    - Key: Network
      Value: Private01

  PrivateRouteTable02:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref VPC
    Tags:
    - Key: Name
      Value: Private Subnet AZ2
    - Key: Network
      Value: Private02
  PublicRoute:
    DependsOn: VPCGatewayAttachment
    Type: AWS::EC2::Route
      Properties:
      RouteTableId: !Ref PublicRouteTable
      DestinationCidrBlock: 0.0.0.0/0
      GatewayId: !Ref InternetGateway
  PrivateRoute01:
    DependsOn:
    - VPCGatewayAttachment
    - NatGateway01
    Type: AWS::EC2::Route
    Properties:
      RouteTableId: !Ref PrivateRouteTable01
      DestinationCidrBlock: 0.0.0.0/0
      NatGatewayId: !Ref NatGateway01
  PrivateRoute02:
    DependsOn:
    - VPCGatewayAttachment
    - NatGateway02
    Type: AWS::EC2::Route
    Properties:
      RouteTableId: !Ref PrivateRouteTable02
      DestinationCidrBlock: 0.0.0.0/0
      NatGatewayId: !Ref NatGateway02
  NatGateway01:
    DependsOn:
    - NatGatewayEIP1
    - PublicSubnet01
    - VPCGatewayAttachment
    Type: AWS::EC2::NatGateway
    Properties:
      AllocationId: !GetAtt 'NatGatewayEIP1.AllocationId'
      SubnetId: !Ref PublicSubnet01
    Tags:
    - Key: Name
      Value: !Sub '${AWS::StackName}-NatGatewayAZ1'
  NatGateway02:
    DependsOn:
    - NatGatewayEIP2
    - PublicSubnet02
    - VPCGatewayAttachment
    Type: AWS::EC2::NatGateway
    Properties:
    AllocationId: !GetAtt 'NatGatewayEIP2.AllocationId'
    SubnetId: !Ref PublicSubnet02
    Tags:
    - Key: Name
      Value: !Sub '${AWS::StackName}-NatGatewayAZ2'
  NatGatewayEIP1:
    DependsOn:
    - VPCGatewayAttachment
    Type: 'AWS::EC2::EIP'
    Properties:
      Domain: vpc
  NatGatewayEIP2:
    DependsOn:
    - VPCGatewayAttachment
    Type: 'AWS::EC2::EIP'
    Properties:
      Domain: vpc
  PublicSubnet01:
    Type: AWS::EC2::Subnet
    Metadata:
      Comment: Subnet 01
    Properties:
      AvailabilityZone:
        Fn::Select:
        - '0'
        - Fn::GetAZs:
          Ref: AWS::Region
        CidrBlock:
          Ref: PublicSubnet01Block
        VpcId:
          Ref: VPC
    Tags:
    - Key: Name
      Value: !Sub "${AWS::StackName}-PublicSubnet01"
  PublicSubnet02:
    Type: AWS::EC2::Subnet
    Metadata:
      Comment: Subnet 02
    Properties:
      AvailabilityZone:
        Fn::Select:
        - '1'
        - Fn::GetAZs:
          Ref: AWS::Region
        CidrBlock:
          Ref: PublicSubnet02Block
        VpcId:
          Ref: VPC
    Tags:
    - Key: Name
      Value: !Sub "${AWS::StackName}-PublicSubnet02"
  PrivateSubnet01:
    Type: AWS::EC2::Subnet
    Metadata:
      Comment: Subnet 03
    Properties:
      AvailabilityZone:
        Fn::Select:
        - '0'
        - Fn::GetAZs:
          Ref: AWS::Region
        CidrBlock:
          Ref: PrivateSubnet01Block
        VpcId:
          Ref: VPC
    Tags:
    - Key: Name
      Value: !Sub "${AWS::StackName}-PrivateSubnet01"
    - Key: "kubernetes.io/role/internal-elb"
      Value: 1
  PrivateSubnet02:
    Type: AWS::EC2::Subnet
    Metadata:
      Comment: Private Subnet 02
    Properties:
      AvailabilityZone:
        Fn::Select:
        - '1'
        - Fn::GetAZs:
          Ref: AWS::Region
        CidrBlock:
          Ref: PrivateSubnet02Block
        VpcId:
          Ref: VPC
    Tags:
    - Key: Name
      Value: !Sub "${AWS::StackName}-PrivateSubnet02"
    - Key: "kubernetes.io/role/internal-elb"
      Value: 1
  PublicSubnet01RouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref PublicSubnet01
      RouteTableId: !Ref PublicRouteTable
  PublicSubnet02RouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref PublicSubnet02
      RouteTableId: !Ref PublicRouteTable
  PrivateSubnet01RouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref PrivateSubnet01
      RouteTableId: !Ref PrivateRouteTable01
  PrivateSubnet02RouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref PrivateSubnet02
      RouteTableId: !Ref PrivateRouteTable02
  ControlPlaneSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Cluster communication with worker nodes
      VpcId: !Ref VPC
  EKSCluster:
    Type: AWS::EKS::Cluster
    Properties:
      Name: !Ref EKSClusterName
      RoleArn:
        "Fn::GetAtt": ["EKSIAMRole", "Arn"]
        ResourcesVpcConfig:
          SecurityGroupIds:
          - !Ref ControlPlaneSecurityGroup
          SubnetIds:
          - !Ref PublicSubnet01
          - !Ref PublicSubnet02
          - !Ref PrivateSubnet01
          - !Ref PrivateSubnet02
    DependsOn: [EKSIAMRole, PublicSubnet01, PublicSubnet02, PrivateSubnet01, PrivateSubnet02, ControlPlaneSecurityGroup]
Outputs:
  SubnetIds:
    Description: Subnets IDs in the VPC
    Value: !Join [ ",", [ !Ref PublicSubnet01, !Ref PublicSubnet02, !Ref PrivateSubnet01, !Ref PrivateSubnet02 ] ]
  SecurityGroups:
    Description: Security group for the cluster control plane communication with worker nodes
    Value: !Join [ ",", [ !Ref ControlPlaneSecurityGroup ] ]
  VpcId:
    Description: The VPC Id
    Value: !Ref VPC
Enter fullscreen mode Exit fullscreen mode

Final Thoughts

We can conclude that Fault-Tolerant systems are intrinsically Highly available solutions with Zero-time downtime, but as we saw in this article, a Highly available solution is not completely Fault Tolerant. Microservices grant us an extra layer of resiliency, that also involves certain risk and complexity. It's down to us as Solution Architects to define which architecture we want to achieve based on business needs or budget constraints.

References:

Discussion (0)