Scalability on AWS

#scalability #architecture #beginners #designsystem

This blog will go into, according to me, one of the most misunderstood topics in computer science and cloud engineering - Scalability. I will start by discussing what scalability is and isn't. I will go on to talk about how it is often intermingled with efficiency and availability and why it is important to untangle those. I will then go into when architects need to worry about scalability instead of the other aspects of architecture. And finally, I will go into tools you can use on AWS to deal with scalability.

Scalability - What is it?

Imagine you're launching a new e-commerce website. It's uncertain how much traffic you'll get at this early stage. Your daily visitors could range from just 10 to a staggering million per second if your site goes viral. This uncertainty is common for startups. Essentially, you're just a viral marketing campaign away from massive success.

The challenge lies in designing an application that's flexible enough to handle both low and high-traffic scenarios. Typically, architects design applications with specific business needs in mind. But for startups, these needs can be vague. If you overbuild for high traffic, you risk overspending on infrastructure for a handful of daily visitors. On the other hand, under-provisioning could mean missed opportunities and system overloads if traffic spikes unexpectedly.

Suppose you opt for a middle-ground solution, creating an application that can handle a moderate amount of traffic without banking on virality. What happens if your site goes viral? You'll face more requests than your system can handle. Your options are:

Do nothing, risking dropped requests and potential system inconsistencies.
Optimize your code to increase throughput on existing hardware. This approach has diminishing returns, as there might be a few inefficiencies to remove.
Add more hardware. But this isn't always straightforward. Your application should be designed to scale with new hardware. For example, a load balancer is essential for evenly distributing requests across servers, but it must be intelligent enough to account for varying server capacities.

The last point brings us to scalability, a crucial aspect of software design. Scalability is the ability to increase processing power by either enhancing existing hardware (vertical scalability) or adding new servers (horizontal scalability). Scalable software might not be the most efficient, as it often includes additional components like load balancers or routing logic, which can reduce overall efficiency. However, the benefit is that your application won't crash under high demand as long as you add the required hardware.

By adopting a flexible approach, you can design your software to meet moderate or even low demand. This way, if your site doesn't take off immediately, you avoid the costs of overprovisioning. But if you do hit high traffic levels, you can easily expand your capacity with new hardware, ensuring that all customers are served without interruption.

Scalability - How to implement it?

Understanding scalability—increasing processing capacity by adding or upgrading hardware—is crucial in application design. However, a common misconception is that merely being cloud-native, such as using AWS, guarantees scalability. Simply adding servers or boosting RAM and CPU power doesn't automatically translate to increased capacity. True scalability must be built into the software from its initial design, incorporating concepts like parallel processing, distributed computing, and asynchronous programming. Cloud platforms like AWS provide tools that aid in scalability, but they are tools for use, not solutions.

Distributed Programming:

The first and most straightforward approach to scalability is through distributed programming.

To scale effectively, you need:

A Trigger: This event signals that the system is overwhelmed. Instead of relying on manual intervention, automated tools like AWS CloudWatch can set thresholds to detect such scenarios. Based on factors like CPU usage or pre-determined high-traffic times, these triggers can activate other AWS Eventbridge services.
A Scaling Event: Once triggered, the application needs to scale by adding new resources or upgrading existing ones. For example, adding a server instance for a busy web application or increasing a database’s RAM.
Load Balancer/Distributor: After scaling, fully utilizing the new resources is essential. Tools like elastic load balancers distribute the workload effectively.

Asynchronous Programming
Another approach is to use asynchronous programming. This is a key method for scalability. Incoming requests are placed in a pool, with each requester receiving a token. Backend workers process these tasks. The system's capacity is decoupled from incoming requests, allowing for efficient handling of varying request volumes. AWS services like SQS, Kafka, and Kinesis support such patterns.

Serverless Solutions:
Finally, AWS offers serverless options like AWS Lambda and AWS RDS Serverless, where AWS manages scalability. Lambda allows you to focus on business logic without hardware concerns, though it does have limitations like execution time and library usage. RDS Serverless automatically scales databases according to demand, charging based on actual usage.

However, serverless architectures have drawbacks:

Fixed Usage Patterns: Serverless solutions like Lambda are designed for specific uses, with limitations on execution time and library types. This can lead to infrastructure lock-in.
Cost: While offering extreme scalability, serverless can be more expensive over time than instance-based architectures if constant high scalability isn't required.

In summary, achieving true scalability involves more than just accessing cloud services. It requires thoughtful design and the strategic use of specific tools and programming paradigms.