Dennis Groß (he/him)

Posted on Dec 20, 2022

Beginners Guide to EC2

#aws #cloud #ec2 #cloudskills

EC2 Instances

The Elastic Compute Cloud (EC2) provides virtual machines on demand. Most of the AWS services are built using the EC2 service in some capacity, which makes EC2 one of the essential services on AWS.

TL;DR

You can choose from different EC2 instance-type families, the families are specialized into a specific use case like…

general purpose-balanced - offers an average ratio between CPU, memory, and network resources, used for regular services and applications.
compute-optimized - focus on CPU resources, used for CPU-intensive operations such as machine learning algorithms.
memory-optimized - focus on memory, used for services that process a lot of data in memory.
accelerated computing - provides hardware accelerators, used for example in graphics processing or specific kinds of data processing.
storage-optimized - focus on high I/O and throughput on attached volumes, used for databases.

Instances in those families often come in different versions such as T2 and T3. The versions can change different aspects of the instance like…

whether CPU credits are used.
how costs are calculated.
what CPU architecture gets used.

In general, go with the latest version of an instance family, AWS tries to provide the best price/value ratio for the newer versions.

AWS introduced recently the new Graviton2 processors which you can already find in the newer instance families like T4g or M6g. These processors use an ARM architecture and provide the same CPU resources at a reduced cost [compared to the x86 counterparts]. Try to go with the new Graviton2 processors when you can but keep in mind that your application must support ARM.

Virtual Machines

You already read in the intro that EC2 instances are virtual machines, so let me give you a short recap of what virtual machines are. Skip ahead if you are already familiar with the concept of virtualization.

Virtual machines help you to abstract from concrete hardware like…

CPUs
Memory
Hard or Network Disks
Network Resources

The base concept behind virtual machines is to use a pool of hardware resources and to create a flexible number of machines with operating systems based on this resource pool. The hardware used by the machines is shared, but the user of the virtual machine would think that this is a fully functional computer.

Data centers use virtual machines to allocate their hardware resources more flexibly. A physical computer has a fixed set of CPU, memory, disk, and network resources, but a virtual machine on the other hand can be allocated an arbitrary amount of those system resources. The flexibility of the virtualization concept helps data centers to upsell their hardware resources to customers.

EC2 Instance Families

EC2 instances come with a preconfigured set of system resources. AWS categorizes instances by…

vCPU - essentially CPU threads, modern CPUs support hyper-threading so one physical CPU core [roughly] supports two different threads. You can say one vCPU equals half a physical CPU core.
Memory
Storage/Instance Storage - what Storage types can be attached to the EC2 instance, mostly EBS.
Bandwidth - max network throughput in Gbps that the EC2 instance supports.

apart from this, there are also a few metrics that apply only to specific EC2 instance families like…

EBS Bandwidth - max network throughput in Gbps to/from attached EBS volumes. A caveat, every EC2 instance has a fixed IOPS limit which can become a bottleneck if you attach multiple EBS volumes.
CPU Credits - let you perform above your instance vCPU computing capacity for a while until your CPU credits run out. You earn CPU credits when you don’t max out on your instance CPU capabilities. This applies not to all EC2 instances and we tend to call EC2 instances with CPU credits burstable instances.

EC2 instance families target different application use cases with the baseline being “general purpose”. Most of your applications should use general-purpose instances which offer a good balance between CPU, memory, and network bandwidth.

Here is an official listing of all instance families by AWS. I advise you to go with the latest iteration of the instance types which offer you in general a better performance/value proposition.

Instancy Types

An instance family like T3 offers different instance types like t3.nano or t3.medium. We saw already in the blog post that virtual machines consist of system resources that are derived from the hardware resource pool. We also saw why this is a flexible concept, but keep in mind that EC2 instances are virtual machines with a fixed set of system resources.

You can’t scale the system resources of a running virtual machine. You need to terminate an existing instance and start a new one to up or down-scale system resources.

AWS provides you with pre-configured EC2 instances and gives them an instance type label. The t3.nano instance type for example features…

2 vCPUs
0.5 GiB memory
CPU credits
Up to 5 Gbps network transfer

So think about what you want to do with your service or application on EC2. It is a good idea to restrict yourself to the smaller instance types like t3.nano, t3.micro, or t3.small if your application workload runs in a staging or development environment.

Productive EC2 instances can be larger like the t3.large instance and may incur significant costs. EC2 uses a pay-as-you-go model, so you pay per second of usage. You only pay for EC2 instances when they are in the running state, stop the instance and you only pay for attached disk volumes like EBS volumes or EFS.

Burstable Instances

Some instance types like the t3.* instances are burstable, which means that they operate with CPU credits. All EC2 instance types have a fixed amount of vCPUs attached which we call the baseline computation performance of the instance.

Now imagine you run a rather small application on your instance and you decided to go with the t3.small instance type. That instance type is sufficient for your use case but they are a couple of situations when you would need additional computation power…

During the booting procedure of the instance.
On occasional traffic spikes when a lot of people access your application.

AWS provides EC2 CPU credits exactly for such use cases. You can think of CPU credits like a savings account in your bank. Your EC2 instance starts with a full savings account filled with CPU credits. You can pay for additional computation resources with these CPU credits whenever your baseline computation power is not enough and you can do that until your CPU credits run out.

This process happens automatically, EC2 scales your computation performance automatically for the price of some CPU credits when your instance goes through some computation-intensive operations.

Your instance will save CPU credits into your instance “savings account” when your application is not performing above the baseline performance. There is a fixed rate at which you save CPU credits that depends on the instance family.

This is a really awesome mechanism and it demonstrates that AWS puts the customer needs in the center when they design/implement their services.

Instance Lifecycle

An EC2 instance can be in one of 7 different states…

rebooting - restarting the instance while reinitializing the instance.
pending - currently starting the instance.
shutting-down - instance termination in progress.
terminated - instance got removed.
stopping - preparing to be stopped or hibernated.
stopped - workloads on the instance are not
running - workloads are running on the instance.

From all of the states above you only pay for…

running
stopped - if your instance is in hibernation.

Some of you might be confused that instances can be terminated, stopped, or hibernated. You should…

stop instances if you don’t need to access your workloads on the instance but want to keep the EBS volumes attached [to the instance]. You will only be charged for the EBS volumes and not for the instance computing resources.
terminate instances if you want to remove the instance permanently, this will also delete all EBS volumes that specify the delete_on_termination flag.
hibernate instances if you want to stop the instance workloads but you have in-memory data stored on the instance that you don’t want to lose by stopping the instance. EC2 will take a snapshot of your instance memory before hibernation and restores the snapshot when you start the instance again.

I also want to make you familiar with a small technicality when it comes to starting/stopping instances. Stopping and then starting an instance is not the same as restarting an instance.

Stopping an instance followed by a start of the instance keeps the identical instance VM.
Restarting an instance may change the instance VM.

That’s a technical detail and for most of you, it won’t matter whether it is the same VM or not.

EC2 is a zonal service which means that instances are deployed into a specific availability zone. You have to keep in mind that the servers that host your EC2 instances are located in a data center of a specific availability zone. The same is true for EBS volumes. This is the reason why you can only attach EBS volumes to EC2 instances of the same availability zone.

Restarting the EC2 instance might result in another instance from the same availability zone. This is not a problem in 99% of the cases but it becomes an issue when you work with software licenses or communicate directly with the IPv4 addresses of your instance (use CNAMES wherever possible!).

Boot Volume

Every EC2 instance needs a boot volume to install the operating system. In most cases, this boot volume is an attached EBS volume that sets the delete_on_termination . The flag indicates if the EBS volumes get destroyed when you terminate the EC2 instance.

In general, we call EBS volumes that…

terminate with the EC2 instance ephemeral volumes.
do not terminate with the EC2 instance volumes persistent volumes.

I encourage you to use managed data services such as DocumentDB, DynamoDB, or RDS for persistent application data. But there are always exceptions to that equation and you might end up requiring a persistent volume for your application data. Make sure that your EBS volume is delete_on_termination=false if you want to use it for persistent data, and schedule backups for this volume.

Attaching EBS Volumes

EBS volumes other than the boot volume get attached as raw block storage to your EC2 instance so you need to…

format the block storage with a file system like ext4.
create a folder on your operating system that you use as a mount target.
mount the formatted volume to the folder.

You don’t have to ssh into new EC2 instances for this, you can automate the procedure with the EC2 User Data script.

EBS volumes can only be attached to a single EC2 instance (except for io2) and the instance must be from the same region and availability zone.

You still can transfer your EBS volume data to an EC2 instance in another region/availability zone…

Create a snapshot of the EBS volume.
Copy the snapshot to the other region/availability zone.
Create an EBS volume from the snapshot copy in that region/availability zone.
Attach the new EBS volume to the EC2 instance.

You need to take an unencrypted snapshot if you want to transfer the snapshot copy to another region. EBS uses the KMS service to encrypt EBS data at rest and KMS keys are a regional construct, so you can’t easily transfer encrypted data cross-region (at least you won’t be able to decrypt it).

A lot of people are frustrated about such details but this is a great effort by AWS to keep the data integrity of regions.

EFS

The Elastics File System (EFS) offers distributed network file systems that you can attach to EC2 (and other services) which is much easier to work with than EBS. As the name suggests, EFS comes with a network file system that is compliant with the NFS protocol installed so you don’t have to mount/format raw block storage.

But EFS has a premium price compared to EBS. The biggest selling point of EFS is that you can attach it to multiple EC2 instances even cross-az, so you should use those network file systems whenever you work with distributed data….

when you have multiple EC2 instances that need to access the same persistent data source.
when you use high-performance computing operations based on a shared data set.

User Data

Some of you might have worked already with configuration management tools like Ansible which in essence do the same thing as the EC2 User Data script.

You can pass in a set of shell instructions to the User Data script attribute of an EC2 instance. These instructions will be executed upon the initial boot phase of the EC2 instance, so only once. Restarting an instance won’t trigger the User Data script again, only terminating the instance and bootstrapping a new instance would trigger it again (or a reboot).

Use the User Data script to automate…

the installation of system dependencies.
installation and start of the software application or service that you run on your instance.
the formatting and mounting of EBS volumes.

I think you should always strive for full automation, so try to automate any installation, configuration or application start that you need to do in your EC2 instance through the User Data script.

AMIs and Golden AMIs

Amazon Machine Images (AMIs) contain the operating system and a set of pre-configurations that EC2 instances use as a baseline. Any additional configuration on top of the AMI will typically be met by the User Data script.

Amazon offers you a set of bare-bones Linux and windows AMIs that you can use, the most important are…

The hosting world relies mostly on Linux software and operating systems, so I go 99% of the time with the Amazon Linux 2 HVM AMI which is based on a Red Hat Enterprise Linux (RHEL).

Use Windows if you need to work with .NET or proprietary software from Microsoft.

The best thing about AMIs is that you can create your own AMIs. Here is the process for creating a new AMI.

Configure an EC2 instance using a base AMI e.g. Amazon Linux 2.
Start the instance.
Right-click on the instance, “create AMI”

Basically, you create a new AMI based on a running EC2 instance.

AMIs are a regional construct, which means that any AMI that you create from a running EC2 instance will be stored in the region in which the EC2 instance runs. You can copy your AMI to another region which will result in another AMI with another AMI ID.

This becomes very relevant when you use IaC tools like CloudFormation or Terraform.

I recommend you…

create a RegionMap in CloudFormation for your AMIs.
use the Data directive in Terraform to filter for a specific AMI in your region by name.

AMIs that contains everything you need to run your application is also called “Golden AMIs”. You should try to create golden AMIs for your EC2 application wherever possible, they reduce the CPU overhead that you incur in the bootstrapping process of your EC2 instances since you use a preconfigured snapshot of an EC2 instance.

There are a couple of ways to build EC2 AMIs automatically…

with the EC2 Image Builder service.
with the Hashicorp Packer tool.

I go with Packer here. It is easier to write a CI/CD process that creates new golden AMIs with Packer than the EC2 Image Builder but that is only my personal impression.

SSH into EC2 via Session Manager

Having SSH access to EC2 instances can be really useful to try out new configurations before you place them in the User Data script, or to debug an existing configuration.

You could SSH into an EC2 instance with the EC2 Keypair, but the instance must be deployed in a public subnet (a subnet with a direct route to an Internet Gateway) and you must open port 22 for ingress through a security group.

This poses a serious security risk for most EC2 instances, and you should always try to restrict access to instances as much as you can.

If not absolutely necessary, you should try to…

put EC2 instances in private subnets.
restrict access through security groups, especially port 22.

AWS provides a tool called Session Manager that lets you SSH into an EC2 instance through a Cloud Shell. You don’t need to open a port in the security groups for this or deploy the instance in a public subnet. You don’t even need to have access to a specific Keypair.

There are only two requirements for the Session Manager to work…

Install the SSM agent on your EC2 instance. I recommend you do this through the User Data Script.
Provide the EC2 instance with the managed policy "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore" to communicate with the Systems Session Manager. You can attach this managed AWS policy to your EC2 instance profile role.

You can access the instance now from the AWS console, navigate to the EC2 service, find your instance, right-click, and click on “connect”. You will be redirected to a dialog window, the second tray says “AWS Session Manager”.

You will see a description of how you need to set up the Session Manager on your instance if your instance is not configured properly.

I install the SSM agent on all of my EC2 instances, and I use the Systems Manager exclusively for accessing my Instances via SSH. As a short recap, the System Manager…

Offers SSH access to instances that are deployed in private subnets.
Logs SSH instance access in Cloud Trail, with user ID.
Requires no SSH Keypair on the instance.
Makes it easy to define who can access the instance based on IAM policies rather than SSH Keypairs.

Launch Templates

In DevOps there is a simple equation:

Code + Configuration = Software Release

In an ideal world, you are able to create an immutable code and configuration version. So, you attach a specific code and configuration combination to a semantic version and make sure that this version corresponds always to the same config and code (immutability).

This is very important and ensures that software releases that you tested on a staging environment operate in the same way on a productive system.

This should apply to software running on EC2 instances too!

We can use golden AMIs to fix the software and dependencies running on the EC2 instance but you can still apply variable configuration for…

Instance type & Instance family
Allocated EBS volume
…

Launch Templates solve this issue and create a fixed software release for EC2 instances by assigning a version number to a template with an immutable set of configurations.

You can create a Launch Template from…

An existing EC2 instance - right-click on the instance in the EC2 console and click “Create Launch template”.
Create a new Launch Template from the EC2 instance console.

Summary

I think that was a lot of information to digest, but I always try to understand the technologies and their limitations that I am working with. But we are not done with EC2, there is still a lot you should know if you want to operate EC2 instances successfully and cost-efficient.

I’ll turn this post into a series and add more content.

But first, let’s have a short recap of the most important things we learned.

EC2 instances have different instance families and instance types. An instance family is always geared to a specific use case such as “general purpose” or “memory-optimized”, the instance type on the other hand just defines the scale of the computing resources.
There are 7 different lifecycle states of an EC2 instance. You pay only for running instances and partially for stopped instances in hibernation mode.
EC2 is a zonal service, instances are deployed into a specific availability zone and you can only attach EBS volumes of the same zone.
AMIs contain the operating system and initial configuration to start your EC2 instance. You can create Golden AMIs that contain your application and all dependencies that you need to run the application on EC2.
Launch Templates create an immutable software release on EC2 and consist of a fixed configuration and AMI.
You should use the Session Manager to SSH into instances.
Startup sequences or the installation of dependencies can be automated with the EC2 User Data script.
Some instance families are burstable, which means they offer CPU credits that you can use to temporarily boost your instance's computing performance.

DEV Community