This article was originally published on my blog here.
I work as a senior software developer at a startup. We mainly provide APIs to integrate voice and sms services into applications. So naturally our complete infrastructure is on a cloud provider, which in our case is AWS.
For someone who has not worked on managing infrastructure on cloud, it can be very difficult to understand different components involved in building scalable applications. For me personally, I didn't know anything about AWS concepts before I started woking here. In the last 2.5 years I've learnt so much on AWS and knowing the importance of learning these concepts, I decided to write a series of articles for people who want to get started with AWS or any other cloud services provider in general. This is the first article in the series.
AWS is one of the market leaders who provides cloud computing services and powers the applications behind companies like Facebook, Netflix, LinkedIn, NASA etc.
As a software developer it's important to know about different cloud computing services which are needed to build distributed, highly scalable applications. If you've not worked on infrastructure yet, you'll definitely get to work on it in your development career at some point. In this article I'll cover some of the important services we use from AWS.
AWS provides numerous services but below are the commonly used ones:
- Elastic Compute Cloud (EC2)
- Relational Database Service (RDS)
- Elastic Container Service (ECS)
- Simple Storage Service (S3)
- Simple Queuing Service (SQS)
- Load Balancer
- Route 53
- AWS Lambda
- Amazon Virtual Private Cloud (VPC)
EC2 instances are basically servers with an operating system which can be used to run your applications on the internet just like you run your applications on your laptop during development.
EC2 machines come with different configurations for CPU, Memory, Ram etc. They are categorised based on their computing power, memory optimisation, storage optimisation etc. For example all the memory optimised instances belong to m family. All the compute optimised instances belong to c family etc.
You can use these instances to run backend servers, background scripts, database servers, front end applications etc.
Since we have many micro services for powering our APIs, we use these instances along with Elastic Container Service (container orchestration service) to deploy our docker containers. We even use standalone EC2 instances as jumpbox hosts for running any ad-hoc scripts to perform tasks like backfill data, connect to private database, redis cache etc.
RDS is a distributed relational database service.
Amazon RDS is available on several database instance types - optimised for memory, performance or I/O - and provides six familiar database engines to choose from, including Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle Database, and SQL Server.
Amazon RDS allows your to create read replicas in the same region or on a different region. You can create one or more replicas of a given source DB Instance and serve high-volume application read traffic from multiple copies of your data, thereby increasing aggregate read throughput.Read replicas can also be promoted when needed to become standalone DB instances.
We use PostgreSQL for all of our main databases and RedShift to keep the data needed for analytics and reporting. There's one central database where the common data is stored and separate databases where data related to different products owned by respective teams are stored.
AWS ECS is a fully managed container orchestration service.
ECS has been a foundational pillar for key Amazon services and it can natively integrate with other services such as Amazon Route 53, Secrets Manager, AWS Identity and Access Management (IAM), and Amazon CloudWatch providing you a familiar experience to deploy and scale your containers.
You can add auto scaling to your ECS clusters to scale up or scale down the number of instances and tasks depending on your traffic needs. When your services are getting high traffic you can increase the number of cluster instances and service tasks. Similarly you can decrease them when the traffic is less. For example you can keep a desired count of 4 containers for a particular service. Using auto scaling you can setup rules like if the CPU % goes beyond 80%, add 2 more containers. If CPU % goes below 40%, remove 2 containers.
Within our organisation each team has their own ECS cluster and each cluster has micro service containers owned by them. We scale individual service independently depending on the traffic.
Amazon ElastiCache works as a high throughput and low latency in-memory data store and cache to support the most demanding applications requiring sub-millisecond response times.
Amazon ElastiCache is a popular choice for real-time use cases like Caching, Session Stores, Gaming, Geospatial Services, Real-Time Analytics, and Queuing.
Amazon ElastiCache offers fully managed Redis and Memcached for your most demanding applications that require sub-millisecond response times.
Similar to EC2 instance types, there are multiple instance families and types available like t3, r5, m5 etc. You can use the one which you need based on your computing requirements and budget constraints.
In our organisation, we have a service which gets around 6000 reqs/sec and we needed low API response time from this service. So we decided use ElastiCache as the primary source of data for this micro service. The service has been able to serve requests with single digit millisecond response time without any issues.
Apart from this we use Redis in many other critical services as write-through, write-back cache and also to store some data which can be accessed quickly.
As the name suggests S3 provides low cost object storage service with high scalability, data availability, security and performance.
S3 can be used to store files for many use cases like websites, mobile apps, enterprise applications, backup and restore etc.
Amazon S3 is designed for 99.999999999% (11 9's) of durability, and stores data for millions of applications for companies all around the world.
We use S3 in various use cases like to store call recordings, invoice pdf files, payment receipts, backup older service logs, Amazon Athena to query data stored in S3 for analytics etc. We even use it for lambda trigger events.
SQS is a fully managed message queuing service that enables you to decouple and scale micro services independently. Using SQS you can send, store and receive messages between different components at any volume. This helps you to build highly scalable and distributed applications.
SQS offers two types of message queues:
Standard queues: Standard SQS should be used when you've requirements for maximum throughput, no ordering and at-least-once delivery of messages is needed.
FIFO queues: FIFO SQS should be used when order of the messages is important and they should be processed exactly once, in the same order they are sent.
The two important properties of SQS queues are message retention period and default visibility timeout.
Message Retention Period: is the time for which any message pushed into the queue is retained. For example if this value is 3 days, messages will be deleted after 3 days from the queue.
Default Visibility Timeout: once any worker / application has picks up a message, the default visibility timeout is the time only after which the message will be visible again for other workers to pick it up and process.
Load Balancer is a critical component of any distributed system which sits between a client and a server, accepts incoming requests, and routes them across a a cluster of servers to handle the load.
It keeps track of health status of all the servers connected. If a particular server is unhealthy, then it will not send incoming requests to that server.
Benefits of a load balancer:
- Faster user experience
- Less downtime and high throughput. If a particular server is down, LB takes care of routing the traffic to the ones which are up.
- Reduces individual server load and prevents any one application server from becoming a single point of failure.
- Improves response time
- Improves overall system availability
Routing algorithms used:
- Least Connection Method
- Least Response Time Method
- Least Bandwidth Method
- Round Robin Method
- Weighted Round Robin Method
- IP Hash Method
We use both internet facing and internal load balancers in our services depending on whether it's a customer facing application or an internal micro service.
There are many other concepts that are tightly coupled with Load Balancers like:
- Target Groups
- Listener Rules
These are not in the scope of this article. But I highly recommend reading about these.
Route53 a highly available and scalable DNS service from AWS. If you don't know what a DNS service is, it's basically the service which routes end users to Internet applications by translating names like www.example.com into the numeric IP addresses like 192.0.2.1 that computers use to connect to each other.
Route53 allows us to route traffic through a variety of routing types like Simple Routing, Weighted Round Robin, Latency Routing, Failover Routing, Multi Answer Routing, Geolocation etc. With different combinations of these we can build highly available fault tolerant systems.
There are different types of DNS records available depending on how you want to route based on DNS queries:
- A record type
- AAAA record type
- CAA record type
- CNAME record type
- MX record type
- NAPTR record type
- NS record type
- PTR record type
- SOA record type
- SPF record type
- SRV record type
- TXT record type
We use route53 in various use cases within our organisation to:
- Route traffic from a host endpoint to an internal load balancer using CNAME record
- Route traffic from a host endpoint to a multi region service using Failover A type records
- Route traffic based on weight to different load balancers. For example: In the past we had our micro services on Opsworks. When we moved to new container based architecture we had to use weight based routing to slowly move the traffic away from old service to new service.
- Use A type record to create aliases etc.
AWS Lambda allows you to run your services without provisioning or managing servers. You only pay for the compute time used, whereas in EC2 instances which are up 24/7 you need to pay for that whole time for which these servers are up.
By providing demand based computing, lambda allows you to run your applications only when needed. You just need to upload your code and Lambda takes care of everything needed to run your application and scale with high availability.
Benefits of using Lambda:
- No servers to manage
- Continuous Scaling
- Subsecond metering
- Consistent Performance
We use lambda for use cases like generating payment receipts, invoices, data reconciliation, fallback mechanisms, analytics etc within our organisation.
Lambdas can be invoked in response to triggers such as changes in data, changes in system state, file upload to S3, actions by users etc.
AWS VPC is one of the core components within AWS which works with multiple other components to secure applications, logically isolate section of AWS Cloud where you can launch resources within a virtual network.
You get complete control over your virtual network environment through security groups, subnets, routing tables etc. For example you can create a public facing load balancer with public subnets. Applications behind these public load balancers can be accessed on the internet. Similarly you can create private services, databases, caches etc by creating them using private subnets which can be accessed only within the VPC.
In real world you have scenarios where you want to access resources within one VPC from another VPC. In these cases we need to create a VPC Peering which allows this access.
As mentioned earlier there are tons of concepts within VPC itself and I highly recommend giving a read on the below concepts if you want to get an better understanding of VPCs:
- Security Groups
- Routing Tables
- VPC Peering
- Internet Gateways
- NAT Gateways
Apart form the above mentioned components there are many other critical components which are out of the scope of this article:
- CloudFront etc
As mentioned earlier this article is the first in the series and the intention was to introduce to various commonly used components in AWS. In the subsequent articles I'll be covering specific use cases, problems we face and how we've solved them using AWS in detail.
Subscribe to my blog to stay updated on these articles!
If you're a beginner to cloud computing and want to learn AWS concepts, here's a great course by Daniel Vassallo who has worked in Amazon AWS team for 10+ years.
I highly recommend buying this course if you think documentations are overwhelming.
Connect with me on twitter where I usually share my knowledge on AWS concepts, building SaaS products and becoming a better developer in general.