Davide de Paolis for AWS Community Builders

Posted on Nov 29, 2022

AWS Elastic Load Balancing and Autoscaling Cheat-sheet/Write-up

#aws #cloudcompute #techlead #solutionsarchitect

Before diving into how AWS helps us with scaling our application allowing for high availability and fault tolerance, let's explain these concept a bit:

Scaling Up, Out or In?

Let's start with a basic distinction: what is the difference between Scaling Up and Scaling Out ( or between Scaling Vertically and Scaling Horizontally)?

Scaling UP means adding more resources to an instance.
Like for example we could scale up from a T2 Instance with 1 vCPU and 1 GB Ram to a C5 that has 8 Gb RaM and 4 vCPU.
Still, single point of failure

Scaling OUT (also scaling horizontally) means just adding more instances.
provide more resiliency, if one instance fails, you still have other running.

Scaling IN means simply reducing the number of instances, whenever the need is decreased.

High Availability and Fault Tolerance

High Availability and Fault Tolerance have the same objective of keeping your system running in case of component failure / outage.
They are although different in design, costs and behaviour.

An application that is highly available will react to a component failure and quickly recover.
A Fault tolerant one, can instead tolerate any component fault to avoid any side effect like performance impact, data loss, or system crashes.

An architecture can be highly available without being fault tolerant, or can be both.

High availability is achieved by removing single points of failure using system redundancy.

Fault tolerance is achieved by adding even more redundant resources, and at different levels, increasing uptime, but also complexity and costs.

If you have 4 EC2 instances and a Load Balancer, it will direct your traffic to the 4 instances, if one fails for some reason, traffic will be directed to the other 3 ( until, eventually autoscaling launches a 4th one again).

If your instances are in different AZ and the entire AZ fails, you still have 2 other Instances running, so your system is Highly Available ( and to some extend Fault Tolerant to the AZ level), but if the Region fails, your system would have a downtime.
Deploying your system to different regions, makes it Fault tolerant, but you can now see how things get more complex and expensive?

Now, let's start from the beginning and discover AWS EC2 Autoscaling, Elastic Load Balancers and CrossZone Balancing.

AWS EC2 AutoScaling

Amazon EC2 Auto Scaling allows you to scale horizontally, therefore launching and terminating instances dynamically, based on your workload and the health of your application (responding to EC2 StatusChecks, ELB Health Checks, CloudWatch metrics...), but also with a schedule.

You create one or more instances in a collection which is called Auto Scaling Group.

You specify the minimum number of instances, the maximum and the desired capacity and EC2 Autoscaling will ensure that your group will have instances satisfying those ranges.
More about it here

Scaling can occur based on demand (performance) or on a schedule.

EC2 autoscaling provides Scalability and Elasticity, because not only it scales out, but it also in, reducing the number of instances when they are not necessary anymore.

How do we configure an Auto Scaling Group?

In order for an Autoscaling Group to know what instance to launch and to which configuration we need to create a Launch Configuration or a Launch Template.

Launch Template specifies the configuration of the instance you want to use - see prev post about EC2 - (AMI, Type, Tenancy, Purchasing options, Access, User data etc)

Launch Configuration is another option, with less attributes than the templates and a worse UI experience in the console, with multiple steps) which have in fact replaced the config.

After we specified the template/configuration of the instances, we then

configure VPC and Subnets,
attach a Load Balancer
configure Health Checks
define group size and scaling policies
combine purchase options and instance types ( in case we want to mix and match between on-demand and spot instances)

Health Checks

We can use EC2 Status Checks and ELB Health Checks together to gather more info about the real status of our application.

Grace period refers to how long we want to wait before checking the health status (in case we need to install or do some configuration when instance is launched)

Monitoring

Instances and Autoscaling group send ( or can send) data points to CloudWatch that we can monitor.

There are different types of metrics:

Basic Monitoring refers to Instances and have 5 minutes granularity.
Group Metrics refer to Auto Scaling Groups, have 1 minute granularity and are not enabled by default.
Detailed monitoring, are metrics for Instances with 1 minute granularity.

While Basic and Group monitoring is free of charge, for detailed instance monitoring you will have to pay a fee.

Additional Settings

Cooldown: has a default of 5 minutes and it is used to prevent that instances are launched or terminated before the effect of previous activities is visible.
Termination Policy: controls which instances must be terminated first in case of scale-in.
Termination Protection: to prevent that specific instances are terminated.
StandBy State: allows to update or troubleshoot an instance instead of that being terminated
Lifecycle Hooks: An Amazon EC2 instance transitions through different states from the time it launches until it is terminated. Lifecycle hooks allow you to execute custom actions - like running a script do download and install software, invoking Lambda functions and so on - when these transitions occur. (see previous post)

Here an interesting list of code samples and best practices to work with AWS EC2 Auto Scaling Groups

Scaling Policies

Dynamic Scaling

Target Tracking: you can choose a metric and whenever the metrics reported by CloudWatch match the conditions (below or above certain value) a new instance is launched.
AWS recommends scaling on metrics with a 1 min frequency.
Simple Scaling: an Alarm is attached to a scaling group and when the alarm is triggered Autoscaling will wait 300 seconds before allowing another scaling activity
Step scaling: in this case an alarm is attached to a scaling group. But the alarm settings and responses are defined in steps. For example, if the alarm is triggered when CPU is > 60 % and the alarm breach is at 70%, launch 2 instances, but if it's at 80% then launch directly 4 instances.

A very important difference to note here is that Target Scaling policy is the only one useful to mantain a metric at, or close to a specified target value** while Simple and Step make adjustments when a specific target value is reached.

Scheduled scaling

As the name suggest this is a scaling policy that you define at specific schedules / intervals. The scaling is not a reaction to something that happened, but an action that you take to prevent your instances to not perform in the way you'd like.

If you know that every day at lunch time you have a peak in usages of your application, you could set up a Scheduled Scaling for 11:45 to bring your instances to a certain number ( desired / min and max running instances ).

Predictive Scaling

Similarly to the scheduled scaling, but managed by AWS. Instances are scaled in advance of daily and weekly patterns in traffic flows.
More on this

Elastic Load Balancers (ELB)

Elastic Load Balancing distributes traffic across targets.

ELBs can be only internal or be internet facing, in this case Nodes will have Public IPs while instances will have private IP Addresses.

Components of ELB

Nodes

Nodes are used by ELB to distribute traffic to the target groups. They are placed within the AZ where you want to have your traffic balanced.

Listeners

A listener ( each LB must have at least one listener) defines how inbound connections are routed based on ports and protocols set as conditions.

Rules

Rules are associated to each listener and help define the conditions the incoming requests get routed to which target group

Target Groups

A target group is a group of resources you want your ELB to route requests to:

Instances: used when you have EC2 instances with Autoscaling group and you want to distribute incoming connections to them.
IP Addresses: both VPC and on premises IP addresses are supported. Useful in case of microservices architecture using containers.
Lambda Functions: distribute traffic to your lambda functions. ( check this post to see how we took advantage of Lambda Target groups and ALB to gradually migrate an old monolith)

Types of ELBs

There are different types of LoadBalancers based on request layer, supported targets and protocols.

Application Load Balancer

it operates at the request level and routes based on content of request (layer 7)
listens for HTTP and HTTPS, gRPC protocols
routing can be based on IP Address, Path, Host, HTTP Header, QueryString Params
supported targets are instances, lambda functions, IP Addresses and containers

Network Load Balancer

it operates at the connection level and routes based on IP Protocol (layer 4)
listens for TCP, UDP, TLS and TCP_UDP protocols
supported targets are UDP and static IP addresses
you can't assign a security group to a NLB
offers ultra high performance, low latency and TLS offloading at scale
can have a static IP / Elastic IP and preserves source IP address

A typical use case for NLB is when our clients need to whitelist static IPs - with ALB you would know the DNS names but IP would change all the time, while NLB can have static adresses.

Classic Load Balancer

The old generation of ELBs, not recommended/available for new applications (useful if you are using EC2 Classic Instances).

performs routing on layer 4 and 7. Despite not having so many features as ALB it has something that is not provided by ALB like
support for EC2 classic
support for TCP and SSL Listeners
support for sticky sessions using application generated cookies
cross-zone load balancing can be disabled
does not support target groups, but rather the target instances can be directly selected.

Gateway Load Balancer

A newer type of ELB very useful in front of virtual appliances such as firewalls , Intrusion Detection/Prevention Systems (IDS/IPS) and when deep packet inspection is necessary.

operates at level 3
listens for all packets on all ports
forwards traffic to the target group specified in the listener rules
exchanges traffic using GENEVE protocol on port 6081

Cross-Zone Load Balancing

When cross-zone load balancing is enabled each load balancer node distributes traffic across the registered targets in all (enabled) AZ. Otherwise each LB will distribute traffic only across the targets in its AZ.

With ALB cross-zone balancing is always enabled while with NLB it can be enabled or disabled (disabled by default).

Just by reading the difference might not be so clear, but imagine you have 10 instances in total but unevenly distributed on 2 AZ ( AZ-one has only 2 while AZ-two has 8).

When Cross-Zone LB is disabled each LB in a AZ will receive 50% of the traffic and distribute it only to the instances in that AZ, causing uneven distribution on the instances ( instances on AZ-one will get 25% of the overall traffic, (50% / 2 instances) while instances on AZ-two will just get 6.25% ( 50% / 8 instances).

With Cross Zone Load Balancing enabled, each LB can route to any of the instances in any AZ), therefore each instance will receive an even load distribution (100% / 10 instances ).

Secure connection / HTTPS

To receive encrypted traffic over HTTPS our ELB must have a server certificate and and associated security policy.
This certificate can be emitted by ACM (Certificate Manager) or by any other 3rd Party Authority.

Behaviour is slightly different between ALB and NLB:

With ALB, traffic will be encrypted from client to the ELB itself, from that point on that secure channel is terminated and traffic will continue to your target group unencrypted.

If we want encryption all the way through our EC2 instance we need to upload our Certificate to our ALB, and another one ( this can be self-signed ) to our Instance. The encrypted channel between client and LB will be terminated but another encrypted channel will be created between ELB and Instance.

With NLB it is possible to follow this approach but it is not really necessary if we just upload Certificate at the Instance Level ( with NLB channel will not be interrupted) and will go through Load Balancer encrypted end2end.

Autoscaling Groups + Elastic Balancing = Magic!

When you attach an ELB to an autoscaling group, the ELB will automatically detect the available instances.
Basically, you create an autoscaling group, you attach the Target Groups to it. Since that target group is also associated with the Load Balancer whenever we have a new Instance launched by the Autoscaling policies, the Load Balancer will immediately start balancing among all the instances available.

Magic!

Session State and Session Stickiness

By default, an Application Load Balancer routes each request independently to a registered target based on the chosen load-balancing algorithm. However, you can use the sticky session feature (also known as session affinity) to enable the load balancer to bind a user's session to a specific target.

The client must support cookies, and the load balancer will use those cookies to determine to which target the new requests should be routed ( to preserve session data).

If that instance will fail for some reason, or be shut down, then the user will be directed to a different instance that has no awareness of the session state. (and therefore will require authentication again).
To be able to keep the state across instances we need to save it externally. Available solutions are DynamoDB and Elasticache (even S3 could be used).

As usual the right architecture depends on requirements, session state is definitely better because it guarantees more resiliency, but it add some latency - and costs)

If Sticky Session are enough for you, enabling them its just a matter of editing the attributes of your target group specifying Stickiness type and its duration.

DEV Community