DEV Community

Totalcloud.io
Totalcloud.io

Posted on

Cost Optimization With AWS Serverless Resource Scheduling

Serverless Resource Scheduling

As soon as we realized that we could save at least 40% of our server costs, simply by switching them on and off as needed, we’ve saved potentially millions of dollars. The concept of turning something ‘on’ & ‘off’ was easy enough to apply to resources like EC2 & RDS instances, because they inherently had the capability to be controlled.

But why stop at just servers? There are a ton of other resources in your cloud on which the potential cost savings are massive - the only apprehension is that they don’t have that inbuilt switch. So instead of a simple on & off, we find more unique ways to schedule them, to achieve similar results. And since AWS doesn’t provide a direct solution, we just have to create one.

Take for example Redshift clusters - at the time of turning ‘off’ - you can take a snapshot & delete the cluster; and when it’s time to turn it ‘on’ - you can create a new cluster & restore the snapshot. This could enable us to schedule resources like EKS, ECS, Neptune Databases, and almost 80% of your cloud. Imagine the cost & operational benefit of being able to schedule your entire cloud when there’s no need for it to be running?

As cloud infrastructure has evolved, a considerable number of us have adopted serverless - following an on-demand execution model. This shift alone has saved us huge amounts - but since you’re charged per execution, any accidental executions or services left running longer than needed can result in bill shocks. So we wanted to extend the scheduling capability to serverless services, with the same goal in mind - save costs. The principle is to block the functions & key entry points that trigger serverless - during non-business hours & weekends, so there are no unintended executions & charges.

Here’s a deep dive into how serverless services can be ‘scheduled’ to achieve the best outcomes. We also see an example of how we, at TotalCloud, use this concept to put our serverless architecture on a schedule.

DynamoDB

Reduce Read and Write requests

Cut down on the number of read and write requests to keep the number as low as possible.

DynamoDB charges you in two ways. The storage past the free tier limit(20GB) and the number read and write request units. Reducing the requests depending on certain time or changes in the architecture, you can save costs. Setting up a workflow that will make sure your RCU/WCU doesn’t go past its partition limit can save you from any accidental requests being made.

Amazon S3

Switch Between Different Amazon S3 Tiers

Switching between different storage tiers like Glacier, Archive and One Zone I-A can bring down the storage charges. However, there are certain conditions when you should opt for such switches. You can’t do this too frequently either as a transition between tiers incur charges as well.

So, for example, Let’s say there’s a particular data stored in your Frequent Access tier that you haven’t been frequently accessing as the name suggests. You can set up a workflow from our platform that can identify it and move it to the much cheaper Infrequent Access tier. All you need are the right permissions and you can easily manipulate your Amazon S3 tiers with just a single workflow.

Lambda Functions

Blocking The Function Execution at The Trigger

You can block unwanted Lambda functions from executing by stopping the trigger. Be it an API Gateway, S3 Event, or DynamoDB event.

Reduce Concurrency

Reducing the time of Provisioned Concurrency on a Lambda function is also a smart way to save costs. Concurrent Lambda functions are a much-needed service for many architectures. However, you can always find ways to reduce the active execution time.

CloudWatch Events

Block Rules to Save Cost on all Resources

You can shut down certain CloudWatch events at the trigger phase by blocking the rule. This will save costs from all the services associated with it. This practice is useful when events are accidentally executed or are scheduled to execute when they shouldn’t.

The reason that makes this hard to achieve has always been the fact that AWS makes it difficult. With Totalcloud, you can create a workflow for each of these services and their schedules along with complete control over their execution.

How we schedule our serverless architecture at TotalCloud

As TotalCloud itself largely runs on a serverless mode, we’ve employed the same concept to our architecture. In our case, the block CloudWatch rules that trigger serverless. We use our workflows to stop scheduled events to shutdown during specific points of the week. Our workflows are no-code, so you can set this up in minutes, like a logical flow of instructions. But we’ve made it easier, our ‘Scheduling solution’ enables you to set up schedules for any resources in a simple UI, so you don’t even have to bother creating a workflow from scratch.

Here’s a quick run-through of how we create our serverless schedules:

Step 1: Choose the scheduling period

With our Custom Schedules, you have the flexibility to choose how your schedules should function. You can set a specific time, make it a one-time event, or make it recurring.

Step 2: Select the service and the resource associated with it

In this case, the service is CloudWatch Events, and since we’re blocking the rule that acts as the trigger for the invocation of the function, select rules as your resource.

Step 3: Choose the Key-Value pair to identify the Event

Here, we specify the rule name associated with the events to filter those out specifically.

Filters can be applied based on different metrics, a parameter, filter based on tags, or even write a function to filter when the function is invoked.

In our case, we set the parameter where all of the rules that start with “ss” are filtered out to be scheduled.

Step 4: Set the Parking action

The parking action dictates what event will come to pass as the scheduled time arrives.

Set the action as ‘disable rule’.

Step 5: Set the Unparking action

Unparking action simply lets you dictate the event to occur when the scheduled period comes to an end. It need not just be the canceling of your event but also the invocation of a consecutive one.

You’ll need the event to start running again once the schedule period is over. Just set the action as ‘enable rule’ in the unparking action.

Step 6: Save and Deploy the schedule.

With this, all our Coudwatch events starting with “ss” are temporarily disabled. This means, we save costs on all the services being used for these events.

Top comments (0)