Kubernetes' Native Metrics and States

#kubernetes #monitoring #devops

In the previously published article, the author reviewed an auxiliary tool for cluster resources monitoring. In this one, the more low-level and "native" apparatus will be described. Putting it simple, how does kubectl top work? Let's see.

Before you can query the Kubernetes Metrics API or run kubectl top commands to retrieve metrics from the command line, you’ll need to ensure that Metrics Server is deployed to your cluster.

A brief intro

Metrics Server is a cluster add-on that collects resource usage data from each node and provides aggregated metrics through the Metrics API. Metrics Server makes resource metrics such as CPU and memory available for users to query, as well as for the Kubernetes Horizontal Pod Autoscaler to use for auto-scaling workloads.

Depending on how you run Kubernetes, Metrics Server may already be deployed to your cluster. For example, Google Kubernetes Engine (☁️ GKE) and Microsoft Azure Kubernetes Services (☁️ AKS) clusters include a Metrics Server deployment by default, whereas Amazon Elastic Kubernetes Service (☁️ EKS) clusters do not.

Run the following command using the kubectl command line utility to see if metrics-server is running in your cluster:



kubectl get pods --all-namespaces | grep metrics-server

If you see running metrics-server pods in kube-system namespace, it means you have Metrics Server pre-installed in your cluster. If not, proceed with installation. Helm chart option is more common.

The Kubernetes Metrics Server is a cluster-wide aggregator of resource usage data. Its work is to collect metrics from the Summary API, exposed by Kubelet (the primary node agent) on each node. Resource usage metrics, such as container CPU and memory usage, are helpful when troubleshooting weird resource utilization. All these metrics are available in Kubernetes through the Metrics API.

The Metrics API has the amount of resource currently used by a given node or a given pod. Since API Server itself doesn’t store the metric values, Metrics Server is used for this purpose. The deployment YAML files are provided for installation in the Metrics Server project source code.

What's under the hood? Flow of data in Metrics Server

Metrics Server will periodically fetch metrics from Kubeletes running on the nodes. Those metrics, for now, contain memory and CPU utilization of the Pods and the nodes. Other entities can request data from the Metrics Server through the API Server which has the Master Metrics API.

An example of those entities is the Scheduler that, once Metrics Server is installed, uses its data to make decisions. Below is an image of the basic flow of data, arrows show directions of data flow:

What metrics can be retrieved?

Now we can explore one of the ways we can retrieve the metrics.

Nodes' consumption



$ kubectl top nodes
NAME                            CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
ip-192-168-41-36.ec2.internal   282m         7%     2807Mi          19%       
ip-192-168-18-10.ec2.internal   445m         11%    3892Mi          26%

Applicaton Pods' consumption



$ kubectl top pod
NAME                                CPU(cores)   MEMORY(bytes)   
app1-64dcf4cd5c-ln2zm               3m           73Mi            
app2-7dcbbbd648-ngn2f               1m           426Mi           
app3-5b69f77db-xc7zs                3m           67Mi            
app4-7dc794458d-zfbcg               4m           71Mi            
app5-6c799956f-lw7qj                1m           46Mi            
app6-9686b9486-z5kj8                2m           81Mi            
app7-bccbd4bc5-8nwfm                2m           62Mi            
...

K8s system pods' consumption

We can get the memory utilization of the Kubernetes system pods. To view the resource utilization of the namespace kube-system, we will have to execute the same command accomplished by --namespace flag:



kubectl top pod --namespace=kube-system

You should see your cloud provider system apps running in the list among k8s system components. The author's playground is hosted on AWS EKS, in that case aws-node agent (for pod networking) and application load balancer ingress controller may be also in the list.

You may use --all-namespaces flag, too.

Metrics snapshot of pods in the cluster

You can query the metrics of all pods or a specific pod by sending a GET request to the /apis/metrics.k8s.io/v1beta1/pods endpoint and the /apis/metrics.k8s.io/v1beta1/pods/<pod name> endpoint, respectively.



kubectl get --raw /apis/metrics.k8s.io/v1beta1/pods | jq '.'

Monitoring containers

To inspect the resources consumed by the containers in a pod, add the flag --containers to the top command.



kubectl top pods --all-namespaces --containers

To peek inside your containers for monitoring the processes running inside them, we can use the popular Linux command: top. The top command allows you to monitor the processes and their resource usage on Linux, and it is installed by default on every Linux distro.

Let's peek inside the containers of a pod, it is straightforward. We will get a shell to a running container and run the top command in the non-interactive mode.



kubectl exec <pod name> -- top -bn1

It is also possible to construct the following command that runs the top command for each pod of the application within the default namespace.



$ kubectl get pods -n default -o custom-columns=name:metadata.name --no-headers | xargs -I{} sh -c 'echo {}; kubectl exec {} -- top -bn1'
app1-64dcf4cd5c-ln2zm
Mem: 15862360K used, 230844K free, 4332K shrd, 6520K buff, 9989144K cached
CPU:   0% usr   0% sys   0% nic 100% idle   0% io   0% irq   0% sirq
Load average: 0.08 0.10 0.15 2/1832 46
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
   29    18 root     S     351m   2%   2   0% node dist/start.js
   18     1 root     S     297m   2%   3   0% npm
    1     0 root     S     297m   2%   1   0% npm
   40     0 root     R     1600   0%   2   0% top -bn1

app2--5b69f77db-xc7zs
...

Such command's output displays:

System time, uptime, and user sessions.
Memory used: RAM and Swap (part of the disk that is used like RAM).
Processes running in the container.
CPU usage in terms of CPU time spent on the various processes.
Average load over one, five, and fifteen minutes.
Task display that includes: Process ID, User who started the process, Nice value, Priority, Memory consumption, State of the process, CPU time, and the name of the process.

Consumption Numbers vs Statistics: States

There is, also developed by k8s team, the kube-state-metrics, a service that listens to the Kubernetes API server and generates metrics about the state of the objects such as deployments, nodes, and pods.

The kube-state-metrics service does not persist data and only has a metrics endpoint that serves the latest data points of the requested object. You can use tools such as Prometheus to scrape the service endpoint and persist the data in permanent storage.

⚠️ It is important to clarify, that kube-state-metrics is not a replacement for the Metrics Server! The Metrics Server helps you monitor the CPU and memory usage of cluster nodes and pods.

On the other hand, comparing to Metrics Server, the kube-state-metrics service allows you to monitor the state of your cluster by serving information about the count, health, and availability of pods, nodes, and other Kubernetes objects. A full list of metrics can be found here.

As you can see, even "native" monitoring tools that are developed in parallel with Kubernetes give you some wiggle room in monitoring.

Fast and robust clusters to you!