CAST AI

Posted on Nov 19, 2021 • Originally published at cast.ai

Cloud cost management alone won't fix your cloud spend problem

#cloud #kubernetes #cloudskills

The pay-per-use model of the public cloud seemed too good to be true. And you probably quickly caught onto its catch: analyzing and predicting your cloud costs is like driving blindfolded hoping that the street traffic would stay the same.

A solid cloud cost management strategy and tooling solves this problem - but only partially. Knowing what your costs are and where your costs come from isn’t going to reduce them magically.

It’s a good start, but you still need engineering resources to implement the changes. And not just once, but on a regular basis - or whenever you see a savings opportunity, note a peak usage scenario, or too much shadow IT starts creeping in.

Is there a better way to control your cloud spend? Keep on reading to find out.

Read this guide to see why cost management isn’t enough in the cloud world:

Why are companies struggling with cloud costs?
But here’s where cloud cost management falls short
What else is there to help with my cloud costs?
And none of that management guarantees savings, unless you decide to automate optimization

Why are companies struggling with cloud costs?

Jumping on the public cloud bandwagon too fast can make that wagon tip over.

Most teams find controlling cloud costs challenging because they never had so much freedom in spinning up new instances and experimenting with different things. Even those who never used anything other than the public cloud struggle to control their cloud spend.

Here are some common reasons why cloud costs spiral out of control:

Companies overlook the risks of the pay-per-use,
They have no visibility into their costs,
They don’t budget for the cloud and let their bill surprise them each month.

Legacy cost visibility, allocation, and management dashboards helped to solve some of these problems, but not all.

So, what exactly is cloud cost management?

Cloud cost management is an umbrella term for cost monitoring, reporting, visibility, allocation, budgeting, and forecasting.

The goal here is to understand and manage the costs associated with public cloud resources. It means knowing where costs come from, to which teams they can be allocated, and how much you’re likely to spend in the future.

The last one is particularly important for CFOs, who aren’t too pleased when they have to restate the quarterly results because someone left an expensive instance running for too long.

Cloud cost management is all about control - or, gaining more granular control over the cloud spend while keeping the same level of performance.

Most cloud providers offer basic cloud cost management solutions to help them achieve that. There are also plenty of specialized third-party tools that offer extra visibility and insight into your cloud expenses.

But here’s where cloud cost management falls short

1. Cloud costs are always changing

Predicting cloud expenses is hard, even if you’re a tech giant like Pinterest. During the 2018 holiday season, the company’s cloud spend went way beyond the initial estimates due to increased usage. Pinterest had to pay AWS $20 million on top of the $170 million worth of cloud resources it already reserved.

2. Resource demands never stay the same either

Using the public cloud is all about striking the balance between cost and performance. Traffic spikes can either generate a massive and unforeseen cloud bill if you leave your check open or cause your application to crash if you put rigid limits over its resources. Cloud cost management doesn’t get you anywhere near to solving this issue.

3. Cost visibility is harder than it sounds

Decision-making about cloud spend is often decentralized in large organizations. This makes visibility more challenging than it seems. Add to that shadow IT projects popping up all over the place and you’ll have to deal with costs that can’t be explained just by taking a look at a dashboard or report.

4. Multi cloud makes cost management even more challenging

Companies that use multi cloud combinations need to consider the costs of several different public cloud providers at the same time. It’s like doubling or tripling the effort you’re doing for one cloud, there are no shortcuts here.

5. Cloud cost management requires manual work

And lots of it. Analyzing your setup, allocating costs to teams, understanding how much you’ve spent on what, finding better options, migrating your applications to better resources, and then checking whether it’s all working - this is what you need to do. And not once, on a regular basis.

What else is there to help with my cloud costs?

There’s cloud cost optimization. The best way to understand what optimization is all about is knowing what tactics it offers to teams looking to control their cloud spend.

Here are a few of them:

Instance rightsizing
Automatic scaling (autoscaling)
Resource scheduling
Removing unused resources
Spot instance use

Not only does optimization help you achieve all of these things, but it can make the process automatic - without adding repetitive tasks for engineers. Some things just aren’t supposed to be managed manually.

Optimizing cloud costs is a point-in-time exercise. You need to keep an eye on your application demands and the available resources 24/7 to identify savings opportunities.

Let's look into some of the named cost optimization points to see why automation brings so much value there.

1. Instance rightsizing and type selection - or, picking the best instance for the job

Selecting the right virtual machine size can drive your bill down by a lot if compute is your biggest expense.

But how can you expect a human engineer to do that when AWS alone has some 400 different EC2 instances alone that come in many sizes?

Similar instance types deliver different performance levels depending on which provider you pick. Even in the same cloud, a more expensive instance doesn't always come with higher performance.

Here’s what you usually need to do when picking an instance manually

1. Establish your minimal requirements

Make sure you do it for all compute dimensions, including CPU (architecture, count, and processor choice), memory, SSD, and network connection.

2. Choose the right instance type

You may select from a variety of CPU, memory, storage, and networking configurations that are bundled in instance types that are optimized for a certain capability.

3. Define your instance's size.

Remember that the instance should have adequate capacity to handle your workload's requirements and, if necessary, incorporate features such as bursting.

4. Examine various pricing models

On-demand (pay-as-you-go), reserved capacity, spot instances, and dedicated hosts are all available from the three major cloud providers. Each of these alternatives has its own set of benefits and cons. This guide covers them all in detail: How to choose the best VM type for the job and save on your cloud bill

Considering that you need to do that on a regular basis, that’s a lot of work!

2. Autoscaling instances as soon as demand changes

If you’re running an e-commerce application, you need to prepare for sudden traffic spikes (think mentioned by a Kardashian on Instagram) yet scale things down when the need is gone.

Manually scaling your cloud capacity is difficult and time-consuming. You must keep track of everything that happens in the system, which may leave you with little time to explore cloud cost reductions.

When demand is low, you run the risk of overpaying. And when demand is high, you’ll offer poor service to your customers.

Here’s what you need to take care of when scaling resources manually:

Gracefully handle traffic increases and keep costs at bay when the need for resources drops,
Ensure that changes applied to one workload don’t cause any problems in other workloads or teams,
Configure and manage resource groups on your own, making sure that they all contain resources suitable for your workloads.

When scaling manually, you'd have to scale up or down your resources for each and every virtual machine across every cloud service you use. This is next to impossible. And you have better things to do anyway.

That’s where autoscaling comes into play.

Autoscaling does all the tasks listed above automatically. All you need to do is define your policies related to horizontal and vertical autoscaling, and the autonomous optimization tool will do the job for you.

3. Managing spot instance interruptions

Spot instances are up to 90% cheaper than on-demand instances, so buying idle capacity from cloud providers makes sense.

There's a catch, though: the provider may reclaim these resources at any time. If you’re an AI-driven SaaS, this is fine while you're doing some background data crunching that you can delay.

But what if you need the workload to avoid the interruption? You need to make sure your application is ready for that and have a plan in place when your spot instance is interrupted.

Here's how you can take advantage of spot instances:

1. Check to see if your workload is ready for a spot instance.

Will you be able to tolerate interruptions? How long will it take to finish the project? Is this a life-or-death situation? These and other questions are useful in determining if a workload is suitable for spot instances.

2. Examine your cloud provider offer.

Examining less popular instances is a good idea because they're less likely to be interrupted and can run for longer periods of time. Before deciding on an instance, look at how often it is interrupted.

3. Make a bid

Set the maximum price you're willing to spend for your preferred spot instance. The rule of thumb is to set the maximum price at the level of on-demand pricing.

4. Manage spot instances in groups

You'll be able to request a variety of instance types at the same time, boosting your chances of securing a spot instance. Learn more about managing spot instances here: Spot instances: How to reduce AWS, Azure, and GCP costs by 90%

To make all of the above work, you’ll have to dedicate a lot of time and effort to configuration, setup, and maintenance tasks.

And none of that management guarantees savings, unless you decide to automate optimization

Traditional methods like cost tracking and reporting can only get you halfway there - and at a hefty cost in engineer time.

Cloud cost management doesn’t guarantee savings, automated optimization does.

Discover the benefits of autonomous cloud cost optimization for your company. Book a demo with CAST AI, the world's leading cloud optimization platform for Kubernetes.

P.S. You can always test out the platform and see what it would automate in your environment. To get started, simply register here.

DEV Community