DEV Community

Cover image for How to Stop Rampant Kubernetes Cluster Growth
Roman Belshevitz for Otomato

Posted on

How to Stop Rampant Kubernetes Cluster Growth

Some lyrics as an introduction

The Edvard Munch's famous painting "Scream" was first presented to the public at the Berlin exhibition in December 1893. It was conceived as part of the "Frieze of Life" - a program cycle of paintings about the spiritual life of a person. Munch wrote about him:

“The Frieze of Life” is conceived as a series of paintings connected with each other, which together should give a description of a whole life. A winding line of the coast passes through the picture, behind it is the sea, it is always in motion, and under the crowns of trees there is a diverse life with its sorrows and joys. Frieze is conceived as a poem about life, love and death.

The author of this brief, of course, will not talk about the spiritual life, but about practical approaches that prevent thoughts about the terrible and otherworldly and save the nerves of engineers.

The essence of the Ops' problem

Kubernetes was originally designed to support the consolidation of workloads on a single cluster. However, there are many problematic scenarios that require a multi-cluster approach to optimize performance. These may include workloads across regions, fault propagation radius limits, compliance issues, harsh multi-user environments, security, and custom software solutions.

Unfortunately, this multi-cluster approach poses management challenges, as the complexity of managing a Kubernetes cluster only increases as the size of the cluster increases. The end result is a phenomenon called cluster sprawl, which occurs when the number of clusters and workloads grows and is not managed coherently.

The solution to this problem lies in the early and rapid identification and implementation of the best management practices in order to avoid serious work in the future.

What is Kubernetes governance?

In order to ensure accountability, transparency, and responsibility, a well-defined collection of rules, policies, and procedures is referred to as governance.

Governance is also about synchronizing clusters and providing centralized policy management. Kubernetes' governance is defined as a set of rules created with policies that need to be enforced across all clusters. This is a critical component for large enterprises running Kubernetes.

Typically, this process means applying matching rules across Kubernetes multi-clusters, as well as applications running in those clusters. And while managing Kubernetes may seem insignificant, it pays off in the long run, especially if implemented in a large organization.

Assume that the enterprise continues to increase the number of clusters in use and does not apply management. These clusters will exist under different rules, which will create a huge amount of extra work for the teams in the near future.

Fortunately, there are only a few very important components to building a successful Kubernetes governance.

Creating successful Kubernetes governance

When considering a successful Kubernetes governance strategy, the first component is to ensure good multi-cluster management and monitoring. You must maintain control over how and where clusters are created and configured, as well as which software versions can be used.

Image description

🔧 Well-built observabilty

Application development and operations teams should be able to centrally view and manage clusters to better optimize resources and troubleshoot. Solutions in this area are developed, for example, by Red Hat, Platform9, Fairwinds and even Rancher Labs. Improved management practices and greater transparency can also save a company from the headaches of a range of security risks and performance issues down the road.

🔧 RBAC strategies

Next, enterprises must have an authentication and access control system in place. Having centralized authentication and authorization will help an organization streamline the login process and help keep track of user activity. This will allow application development and operations teams to ensure that the right people are doing important tasks in real time.

🔧 Policy management

Finally, to govern Kubernetes, enterprises must optimize policy management. Companies need to think about how Kubernetes will impact their development culture and work on finding the right balance of business agility and development. Ultimately, governance (with the appropriate level of flexibility) ensures that businesses can meet customer needs and deploy mission-critical services in a consistent and consistent manner.

In Kubernetes, Admission Controllers enforce policies on objects during create, update, and delete operations. Admission control is fundamental to policy enforcement in Kubernetes.

Admission controllers allow you to enforce the adherence to certain practices such as having good labels, annotations, resource limits, or other settings.

Being the CNCF project, Open Policy Agent (OPA) is a great tool to develop and implement such policies at scale throughout an organization. Every request will go through the OPA, as illustrated below, and will be decided depending on the policies established for the Kubernetes cluster. The request will be carried out if it complies with the policy. The OPA will reject the request if it violates the established policies.

As a good practice, by deploying OPA as an admission controller, you can:

  • Require specific labels on all resources.
  • Require container images come from the corporate image registry.
  • Require all pods specify resource requests and limits.
  • Prevent conflicting Ingress objects from being created.

Image description

Goals to achieve

But what should be the goals of governance? Where should it be enforced and tested? The four most effective management objectives are security policy, network management, access control, and image management. Let's look at each of these goals one by one:

🎯 Security policy

In security policies for governing Kubernetes, it is important to restrict user access to pods in clusters. Cluster users should have well-defined access based on their role.

To do this, enterprises must implement a security policy that will have rules and conditions related to access and privileges. In this policy, they must specify that containers have read-only access to the file system and that containers and child processes cannot be subject to privilege changes.

🎯 Network management

Network policy plays a very important role in determining which services can communicate with each other. Here, companies must determine which modules and services can interact with each other and which should be isolated. This also applies to module security in Kubernetes management.

The right approach is aimed at controlling traffic within Kubernetes clusters. This approach can be based on modules, namespaces, or IPs, depending on management requirements.

Each popular CNI plugin uses a different type of configuration for the network setup. For example, Calico uses layer 3 networking paired with the BGP routing protocol to connect pods.

Cilium configures an overlay network with eBPF on layers 3 to 7. Along with Calico, Cilium supports setting up network policies to restrict traffic.

🎯 Administration and access control

In access control, when configuring role-based access control (RBAC) policy, administrators need to restrict access to cluster resources. Using Kubernetes objects such as Role, ClusterRole, RoleBinding, and ClusterRoleBinding, they need to fine-tune access to cluster resources appropriately.

Because permissions granted by a ClusterRole apply across the entire cluster, you can use ClusterRoles to control access to different kinds of resources than you can with Roles. These include:

  • Cluster-scoped resources such as nodes
  • Non-resource REST Endpoints such as /healthz
  • Namespaced resources across all Namespaces (for example, all Pods across the entire cluster, regardless of Namespace).

After creating a Role or ClusterRole, you have to assign it to a user or group of users by creating a RoleBinding or ClusterRoleBinding:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: testadminclusterbinding
subjects:
  - kind: ServiceAccount
    name: myaccount
    namespace: test
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io
Enter fullscreen mode Exit fullscreen mode

🎯 Image management

Using public Docker images can increase the speed and flexibility of application development, but there are many vulnerable Docker images, and using them in a production cluster can be very risky.

Image management is also part of Kubernetes governance. All images that will be used in the cluster must be pre-scanned for vulnerabilities. There are several approaches to finding vulnerabilities. How and where an organization checks for vulnerabilities depends on its preferred workflows. However, it is recommended that you test your images before deploying them to a cluster.

Hacker activity has increased exponentially in recent years, and loopholes in systems continue to be discovered. Therefore, it is very important for companies to be vigilant when implementing practices to ensure that they only use official, clean, and verified Docker images on a cluster.

Threat actors can mount sophisticated assaults employing previously dependable third-party artifacts as an attack vector by using malicious scripts or malware concealed in a container image. Static, pattern-based, or signature-based scanners are not effective against this kind of attack because it only appears during runtime.

By evaluating the attack kill chain and running images in a secure hosted sandbox environment, several security solutions can reduce this risk.

Image description

To examine images in a running state both before and after the image is checked into a registry, these tools, i.e. trivy by Aqua Security, are frequently incorporated into CI/CD processes. Malicious behavior and unfulfilled policy requirements can mark an image for registry deletion or prevent check-in entirely.

Instead of conclusion

Thus, the author has given in brief the directions needed to better govern Kubernetes and ensure the security of important enterprise systems and data, as well as to limit cluster growth and possible disorder. Stay strong and focused!

The author is thankful to Arthur Chiao, Oleg Chunikhin (CNCF), Tomas Fernandez (Rendered Text / Semaphore), Mike Jordan (Coredge), Kristijan Mitevski and Steven Zimmerman (Aqua Security) for their contribution to comunity.

Top comments (0)