CAST AI

Posted on Nov 29, 2022 • Originally published at cast.ai

Kubernetes Labels: Expert Guide with 10 Best Practices

#emptystring

With Kubernetes labels, DevOps teams can troubleshoot issues faster, apply configuration changes en masse, and respond quickly to issues. Labels also give crucial insights into your costs, boosting your monitoring, allocation, and management capabilities. Following best practices when using labels helps you realize tremendous benefits from infrastructure visibility and efficient operations.

Here's everything you need to know about Kubernetes labels - what they are, how they work, when to use them, and the 10 best practices to follow to build a solid labeling strategy.

What are Kubernetes labels?

Kubernetes labels are key-value string pairs that link identifying metadata to Kubernetes objects. Kubernetes provides teams with integrated support for using labels to retrieve and filter the data from the Kubernetes API and carry out bulk operations on the selected objects.

Many teams use Kubernetes labels to provide DevOps with information about the ownership of a node, a pod, or other Kubernetes objects for easier tracking and operational decision-making.

When creating a new label, you must comply with the restrictions Kubernetes places on the length and allowed values. A label value must:

contain 63 characters or less (a label's value can also be empty),
start and end with an alphanumeric character (unless it’s empty),
only include dashes (-), underscores (_), dots (.), and alphanumerics.

You can find the labels a Kubernetes object has by using kubectl. For example, to get all labels for a pod named pod1, you can run:

> kubectl get pod1 -o json | jq .metadata.labels

To create a label, you can specify them in your configuration file spec’s metadata.labels object. Let's consider the pod.yaml file that describes a single pod:

apiVersion: v1
kind: Pod
metadata:
 name: nginx
 labels:
   environment: dev
   critical: "true"
spec:
 containers:
   - image: nginx
     name: nginx
     resources:
       requests:
         cpu: 500m

Note that the value of the critical label is “true” and not true. That is because labels, as well as their values, must be strings.

Let’s apply the configuration file:

> kubectl apply -f pod.yaml
pod/nginx created

You can now apply or overwrite a label directly on an already existing Kubernetes object using kubectl. First, get all the labels that the pod has:

> kubectl get pod nginx -o json | jq .metadata.labels

{
  "critical": "true",
  "environment": "dev"
}

Now, to change the environment label’s value and add a new key-value label pair deprecated=true, we execute the following command:

> kubectl label pod nginx environment=prod --overwrite
pod/nginx labeled

> kubectl label pod nginx deprecated=true
pod/nginx labeled

Keep in mind that updating a label’s value is not allowed unless you explicitly overwrite it with the –overwrite flag. The resulting labels are as follows:

> kubectl get pod nginx -o json | jq .metadata.labels

{
  "deprecated": "true",
  "critical": "true",
  "environment": "prod"
}

Kubernetes labels vs. annotations

Kubernetes offers two tactics for connecting metadata with objects: labels and annotations.

Annotations are key-value pairs that connect non-identifying metadata with objects. For instance, an annotation could contain logging or monitoring information for a given resource.

The main difference between labels and annotations is that annotations are not used to filter, group, or operate over the Kubernetes resource. Instead, you can use them to access additional information about it.

For example, the annotations for the node where the previously deployed pod has been scheduled are as follows:

> kubectl get node demo-node -o json | jq .metadata.annotations

{
  "kubeadm.alpha.kubernetes.io/cri-socket": "unix:///var/run/cri-dockerd.sock",
  "node.alpha.kubernetes.io/ttl": "0",
  "volumes.kubernetes.io/controller-managed-attach-detach": "true"
}

Those annotations do not provide any information about the node’s characteristics. Instead, they offer some data on how the node works.

When to use Kubernetes labels?

Group resources for object queries

If you add the same label key-value pair to multiple resources, other people can easily query for all of them. For example, a DevOps engineer discovers that a development environment is unavailable. At this point, they can quickly check the status of all pods including the label environment:dev.

Here’s an example command:

> kubectl get pods -l 'environment=dev'

NAME    READY   STATUS              RESTARTS    AGE
nginx   0/1     CrashLoopBackOff    1           5m

This lets the team instantly see the affected pods and resolve the issue much faster than going through all the resources and picking just the ones in the dev environment.

In a complex case with many different deployments, finding the right dev pods would take the DevOps engineer ages if the engineering team didn’t add the environment:dev label to the resources. The DevOps engineer would have to use a generic kubectl get pods command and then comb through the output using a tool like grep.

Perform bulk operations

Another use case of Kubernetes labeling is to carry out bulk operations based on the resource labels.

Suppose that an engineer removes all staging environments every night to reduce cloud costs. By using Kubernetes labels, they can easily automate this task.

For instance, here’s a command that deletes all objects labeled environment:local, environment:dev or environment:staging:

> kubectl delete deployment,services,statefulsets -l 'environment in (local,dev,staging)'

Schedule pods based on node labels

The hidden gem of Kubernetes labels is that they are heavily used in Kubernetes itself for scheduling pods to appropriate nodes. By using labels, you can have more control over the resources you create by making Kubernetes schedule specific deployments onto specific nodes.

Let’s see how this works in practice:

> kubectl get nodes

NAME                STATUS  ROLES   AGE VERSION
gke-node-1fe68171   Ready   <none>    1d  v1.22.12-gke.2300
gke-node-3cdf3d2b   Ready   <none>    3d  v1.22.12-gke.2300
gke-node-5f7b4cf1   Ready   <none>    5d  v1.22.12-gke.500

> kubectl get nodes -l ‘critical=true’
No resources found

Currently, no nodes with the label critical:true exist.

Let’s try to create a pod that has to be scheduled on a node with the label critical:true using a node selector. Here is a pod.yaml configuration file for that:

apiVersion: v1
kind: Pod
metadata:
 name: nginx
 labels:
   environment: prod
spec:
 nodeSelector:
   critical: "true"
 containers:
   - image: nginx
     name: nginx
     resources:
       requests:
         cpu: 500m

Now let's apply it and check what happens:

> kubectl apply -f pod.yaml
pod/nginx created

> kubectl get pod nginx

NAME    READY       STATUS  RESTARTS    AGE
nginx   0/1         Pending 0           1m

> kubectl get events --field-selector involvedObject.name=nginx

LAST SEEN   TYPE    REASON              OBJECT      MESSAGE
46s         Warning FailedScheduling    pod/nginx   0/1 nodes are available: 1 node(s) didn't match Pod's node affinity/selector. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.

Note that the pod cannot get scheduled on any of the nodes since none of them has the required label. Now, let’s label one of the nodes with the required label:

> kubectl label node gke-node-5f7b4cf1 critical=true
node/gke-node-5f7b4cf1 labeled

> kubectl get nodes -l 'critical=true'

NAME                STATUS  ROLES   AGE VERSION
gke-node-5f7b4cf1   Ready   <none>    5h  v1.22.12-gke.500

And now, let’s check the pod:

> kubectl get pod nginx

NAME    READY   STATUS  RESTARTS    AGE
nginx   1/1     Running 0           3m31s

The pod has been successfully scheduled to the node.

Keep in mind that if multiple labels are specified in the node selector, they all must be satisfied by a node in order for the pod to get scheduled on it.

10 best practices for Kubernetes labels

1. Make use of the labels recommended by Kubernetes

Kubernetes provides a list of recommended labels for grouping objects. For example, Kubernetes recommends using app.kubernetes.io/name and app.kubernetes.io/instance to represent the application’s name and instance, respectively. Just drop the prefix “app.kubernetes.io” and add your company’s subdomain to customize the labels.

2. Pay attention to correct syntax

To create a Kubernetes label key-value pair, you need to use the following syntax: <prefix>/<name>. Let’s dive into the details:

<prefix>

The prefix is optional; if you choose to use it, it needs to be a valid DNS subdomain (such as "cast.ai") and have no more than 253 characters in total. Prefixes come in handy for tools and commands that aren't private to users. They are also helpful because they let teams use multiple labels that would otherwise conflict (think of the ones in third-party packages).

Note that the kubernetes.io/ and k8s.io prefixes are reserved for Kubernetes core components.

<name>

This part refers to the arbitrary property name of the label. Teams can use the name “environment” with label values such as “production” or “testing” for clarity.

A name must meet the same requirements as the label value but it can’t be empty. Hence, the name needs to have 63 characters or less, beginning and ending with an alphanumeric character ([a-z0-9A-Z]) with dashes (-), underscores (_), dots (.), and alphanumerics in between.

3. Standartize label naming conventions

Multiple teams using Kubernetes need to follow the same labeling conventions. Otherwise, all the labeling effort will bring you no value.

It’s a good practice to have your development pipeline carry out static code analysis on resource configuration files to ensure that all the required labels are there. If you fail to apply labels properly, automated processes may get broken - and any monitoring solutions you use may send you false-positive alerts.

4. Avoid unnecessary changes to labels

Labels in Kubernetes are used to identify and select resources for scheduling, deployment, and administration purposes. As a result, modifying a resource's label can have far-reaching and unforeseen implications.

For instance, if you switch a group of pods' "app" label from "frontend" to "backend," Kubernetes can reschedule those pods onto nodes that aren't set up to run the "backend" app. The pods can crash; as a result, making them unavailable.

It's crucial only to modify labels when it is absolutely essential and carefully evaluate the ramifications of any changes before making them to avoid these kinds of issues.

5. Use label-selection options

Teams can select labeled objects based on equality and sets.

Selections based on equality allow you to retrieve objects with labels equal or not equal to the specified value (or values). Diving down into syntax, = and == both represent equality, while != represents inequality. It’s possible to add multiple labels separated by commas (all conditions need to match here). For example, if you execute the following command:

> kubectl get pods -l ‘environment=dev,release=daily’

it will return all the pods that have labels environment:dev AND release:daily.

On the other hand, selections based on sets allow finding resources with multiple values at once. Sets are similar to the IN keyword in SQL. For example, the following command:

> kubectl get pods -l ‘environment in (prod,dev)’

will find all the pods that contain the label environment=prod OR environment=dev.

6. Don’t store application-level semantics in labels

Kubernetes labels may come together with an object’s metadata, but they’re not supposed to serve as a data store for applications. Given that Kubernetes resources are often used for a short period of time and aren’t tightly associated with applications, labels quickly become unsynchronized and, therefore, useless.

7. Don’t store sensitive information in labels

If someone gains access to your Kubernetes cluster while you store passwords or API credentials, or other sensitive data in labels, they will be able to see it in plain text. This is a significant security risk and may have negative effects like identity theft or data breaches.

It is advisable to preserve sensitive information in secrets rather than labels. Secrets are encrypted, and only the pods that require them may decrypt them. By doing this, even if someone manages to access your Kubernetes cluster, they won't be able to view the private data kept in secrets.

8. Add labels to pod templates

Add essential labels to pod templates that are part of workload resources. That way, Kubernetes controllers can consistently create pods with the states you’ve specified.

The goal should not be to create as many labels as possible but to create labels that bring value to your team. Start small and create a list of labels to be part of the template. For example, you can start by identifying the resource owners, the environment the resource is running in, and the release.

9. Automate your labeling practice

Automation can save you plenty of time, and labeling is no exception to that. If you have a continuous integration/continuous delivery (CI/CD) pipeline set up, you can easily automate some labels for cross-cutting concerns.

It’s smart to automatically attach labels with CD tooling since it guarantees consistency and makes engineers more productive. It’s also a good practice to have CI jobs enforce proper labeling by making a build fail and sending a notification to the responsible team if a label is missing.

10. Use labels for cost monitoring

Labels are very helpful for gaining a better understanding of your Kubernetes cloud costs. Cost monitoring, allocation, and management all rely on a proper labeling strategy.

If multiple tenants share resources in a single cluster, you need to use relevant labels to create a cost allocation report. This is how you can determine which team, service, or application generated specific costs, which helps greatly when investigating an unexpected cost spike.

Use this free monitoring tool to track your costs by labels

CAST AI provides a cost monitoring tool that allows you to stay updated on the costs of any of your workloads. The costs can be filtered by any label that exists on any of your workloads, making it easy to track cloud costs per team, service, or any other label that you use. The option to group workloads by label is coming soon.

See the difference good labeling and cost monitoring can make by connecting your cluster to CAST AI’s free cost monitoring solution.

DEV Community