Evolution of Capacity Management: From Bare Metal to Kubernetes

#kubernetes #capacity #management

Introduction

At mkdev, we love Kubernetes and use it daily. We architected and built Kubernetes-based platforms for customers in finance, health, compliance and many other spheres, both on-premise and in the cloud, serving millions of users around the world.

One of the first things that we notice with almost every customer is the confusion about what Kubernetes is and what is it capable of doing. The confusion comes from the misconception that Kubernetes is a container scheduler and orchestrator. We believe that Kubernetes is something very different and much more fabulous than that.

In this series of articles, we are going to give a short tour of how infrastructure evolved and where is the place for Kubernetes in this evolution. We will also describe and provide advise on capacity and resource management in Kubernetes cluster, both demystifying this part of Kubernetes and covering everything you need to do to make sure your Kubernetes clusters are cost-effective and ready for scale and multi-tenancy.

The Jump to Virtual World

At the lower levels of the infrastructure, there is always a physical server. In the past, that’s all you had to work with. Your applications were constrained to the resources of a bare-metal machine.

This hardware was already paid for, so you would normally want to make use of all of those resources. If you have a server with 16 CPU Cores, and your application only uses 8 cores, you are inclined to put more workloads on this server. Unused compute resources are wasted compute resources.

Naturally, servers are just one part of the infrastructure. There are also network devices, storage and dozens of other components. While we are not going to focus on those other parts in this text, you will notice that the general ideas discussed further are applicable there too.

There are many challenges involved in managing physical servers. One of the most, if not the most important challenges is how to scale up the physical infrastructure as demand grows. Once you reach the limits of your existing machines, you reach physical limits of your infrastructure. Increasing those limits means changing physical configuration of the environment, either by scaling existing hardware vertically (by plugging in more RAM, for example), or horizontally, by adding more servers.

Cost calculation becomes important part of this process: you want to have some excess capacity for future growth, but you don’t want to buy too much hardware and let it idle. You can not easily scale down. Once you buy a server, you own a server. Selling this server when it’s no longer needed takes time, assuming that you could find a buyer.

Not every company ended up maintaining it’s own physical servers. Hosting providers would take care of it. They would manage a huge amount of bare metal machines and infrastructure around it, and then sell this capacity to other companies. After all, most companies are not in the data center business, they simply require infrastructure to run their software on.

Even if you purchase infrastructure from a hosting provider, your servers do not become invisible. You still have same physical constraints of the servers you purchased, and you have the same capacity management challenges - though a bit simplified, as many of the daily tasks would be handled by the server infrastructure provider.

Everything changes with the advance of the virtualisation technology. Virtualisation, among many other things, allowed to split single physical server into many virtual servers, with their own CPU and RAM configurations. With virtualisation, infrastructure providers could now sell an abstract, software-defined combination of CPU, RAM, storage and networking, called a virtual machine.

Virtual machines, as obvious from the name, do not exist in our physical world. They are defined by software, their complete lifecycle and configuration are completely detached from reality. Physical servers did not disappear, but went to the shadow.

In the beginning of virtualisation conquest of the infrastructure, many hosting providers still exposed underlying physical infrastructure to their customers. You could be fully immersed in virtualised workloads, but at the same time you still had to take care of purchasing and plugging in ESXi hosts. In this case, virtual machines provided significant amount of new abstractions and approaches, but from the capacity management perspective little progress happened.

The real breakthrough happened once some infrastructure providers completely hid the physical layer from the customer. Today, we call those infrastructure providers the “public cloud”. Once you remove real servers and hardware from the picture, you finally get to the first leap in capacity management and resource allocation.

If your complete infrastructure is just the software, then you are no longer constrained by the hardware. From this moment, your infrastructure is elastic. You can scale it up and down as you please, any second you want. You can re-size your virtual servers to get more CPU and RAM, more storage, more network interfaces - all of this in an instant.

You are no longer buying the complete server. Neither you pay a monthly rent for a particular machine. Instead, you pay just for the time you use your software defined infrastructure. Today, this time is most often measured in seconds, though even per-minute billing was an impressive achievement on it’s own.

There are conflicting definitions of what “software defined infrastructure” is. In some sources, it is confused with “infrastructure as code”. In this text, it means infrastructure that exists as a software entity (i.e. virtual machine) on top of a hardware infrastructure.

The Jump to Software

Just like developer-friendly programming languages abstract away the complexity of lower-level languages and processes, the same way physical servers were abstracted away by the impressive in both scale and complexity software. As a customer of a hosting provider or, a better word at this point, cloud provider, you no longer operate in hardware terms.

Your job has never been to manage physical servers. Neither you should ever think about them, unless you have very specific requirements in terms of performance (and here we are talking about “fast”, as in “high performance computing” kind of fast), security (and here it means military-grade security) or else.

Software that cloud providers wrote on top of their data centres redefined how infrastructure is being treated. The premise of the cloud hardly needs to be sold at this point. You are either already in the cloud, planning to move some workloads to the cloud, or you wish you were doing it, but due to your specific requirements you can not.

Nevertheless, it is important to re-iterate why exactly cloud infrastructure is as important as it is today: for most of us, it replaced the physical world of infrastructure with the world of software-defined abstractions and ideas.

On the physical level, at least to a person not particularly interested in the hardware, things are rather boring. There are CPUs, RAM, storage and network devices, chips, silicon, metal, fans, cables, wires. But on the virtual level, we are now dealing with an infinite amount of existing and potential ideas around how we can run the software.

First ideas were rather simple. Virtual server is a software representation of a physical server. Just like a physical server, it has a set amount of RAM, CPU and storage. The biggest difference is that, unlike a physical server, virtual one is a “cattle”. You do not - or should not - care about a virtual server. Conceptually though, virtual server does not go far from a physical one.

Some of the ideas that followed didn’t go much further. For example, instead of just a “virtual server”, cloud providers came up with a “virtual database”. “Virtual databases” - like AWS RDS - are rather a layer of automations on top of a virtual server(s). Your RDS instance is still defined in terms of concrete CPU and RAM combination, certain number of IO operations and disk space. If you hit the limit of this instance, you need to scale it up - not unlike you would scale up any other server, or a set of servers. Software nature of this instance allows to do this scaling much easier and faster, but conceptually such a database is on the same level as any other virtual server.

Some other ideas were way more exciting, even ground breaking. We got, as one example, Queue as a Service - like AWS SQS. Such a Queue is not defined in terms of a server capacity. Instead, it’s defined by the amount of messages that go through and by the publishers and subscribers of this queue.

This is the next conceptual jump in infrastructure: not only the physical infrastructure is invisible, but the “infrastructure” as we are used to it in the physical world does not exist anymore. If everything is a software, then we do not have to define infrastructure in the same terms as bare metal servers. We can have such infrastructure components as Queues, Buckets with Objects, Audio Transcriptions, Tables, Data Streams and many others. Some argue, too many of them.

Most of the new infrastructure abstractions became powerful building blocks for cloud providers customers. Modern applications communicate with infrastructure over APIs. Your job processing application no longer connects to a job queue server. Instead, it talks directly to a queue. Queue itself, as much as you are concerned, is serverless.

This is when serverless technology truly first appeared. We will return to the word “serverless” at the end of this series.

There is just one last piece in your software stack that is still defined in an old-fashioned terminology of “compute resources”, the last piece that still requires “capacity management” in the most basic of it’s meanings - figuring out how much CPU and RAM you need, across how many virtual servers and at which time. This piece are your applications.

Your software is inevitably defined in the amount of compute resources that it needs. This leads to the following problem: how do you make sure that the (virtual) server where your application is running fits the compute needs of your application as close as possible? Just like with a physical server, you do not want to have too many resources left unused, neither you want to under provision resources and impact the performance.

You end up right-sizing each of your workloads with the correct combination of resources per application instance, and configuring the infrastructure to ensure horizontal and automatic scaling of those application instances. The standard implementation of an “application instance” these days is a container, dominantly Linux container.

The Jump to Containers

Container itself is yet another powerful software abstraction: it contains everything required for your application to run, including your application. Ideally, it does not contain anything other than that.

Containers are, conceptually, the perfect, the smallest possible deployment unit for software. They do not always achieve this goal, quite often containing a bit more than your application needs. Still, containers are a step forward from virtual machines. Where virtual machine defines a server, container defines an application.

On it’s own, in isolation, containers are free from any compute resource specifications and, thus, free from capacity management challenges. Start a new container with Docker, Podman or any other container manager, and it will happily consume as many resources as it needs, up to the physical limit of your laptop - this, unless you set a resource boundary for the container.

Thus containers have two sides. On the idea level, containers are as free from the physical world as possible - they are not servers, they are the “instance of your application”. On the implementation level, they are Linux processes, constrained by resources of a machine where they run, be it a virtual or a physical server.

Implementing containers poses many questions. How do you put containers on the server? How do you increase the number of containers your application needs? How do you group multiple containers to represent a single “service”? How do containers - or services - talk to each other? These, and many other questions are answered by a container orchestration technology. The most popular, de-facto standard container orchestrator right now is Kubernetes.

In the next article of this series, I will explore what Kubernetes gives us in terms of new ideas around how to treat our infrastructure and applications. We will learn how it inevitably fails to bring progress to the one of the most important aspects of infrastructure evolution: abstracting away the physical world with the help of software.

This article was written by Kirill Shirinkin for mkdev.me.