Michael Levan

Posted on Oct 12, 2022

Monitoring AKS With Prometheus and Grafana

#kubernetes #cloud #programming #git

When you deploy any type of Kubernetes environment, or rather before you even deploy it, you should be thinking about a monitoring and observability plan. That way, you can fully understand if something bad occurs in your environment and you can take action on it from both an automated and repeatable fashion, along with a manual fashion if need be.

In this blog post, you’ll learn about how to set up two of the most popular monitoring and observability tools on Azure Kubernetes Service (AKS).

Why Monitoring and Observability?

First, let’s talk about the distinctions between monitoring and observability. Monitoring is a way that you can see what’s happening in an environment in either real-time or as far back as the tools/platform will allow you to go. You can also typically create alerts based on a monitor threshold. For example, if the CPU is above 90% for five minutes.

Observability is all about taking action with the data being ingested from your environment. Observability looks at logs, metrics, and traces. For example, you can ingest the /metrics API if enabled on your Kubernetes cluster. Then, if something occurs in the logs, metrics, or traces, an observability tool can take action. That action could be an alert like what monitoring can do, or it can be to take an automated approach. For example, if three worker nodes are almost at full capacity, an observability tool can see that and scale out to four nodes automatically without you having to perform any manual intervention.

In short, monitoring is about looking at data in real-time and observability is about taking action on the data while not having to look at it in real-time.

Metrics Server

One thing to keep in mind when using an observability tool to collect metrics from Kubernetes is to ensure that the Metrics server is running. By default, this may not always be the case. For example, on-prem Kubernetes clusters may require you to turn it on. In AKS, it’s on by default so you don’t have to worry about it.

Setting Up AKS

To deploy an AKS cluster, you can use a few methods including:

Via the portal (manual)
An Infrastructure-as-Code tool (automated)

If you don’t have an AKS cluster, you can use the code found here: https://github.com/AdminTurnedDevOps/Kubernetes-Quickstart-Environments/tree/main/azure/aks

The code in the above GitHub repo is a Terraform configuration to create an AKS cluster.

If you haven’t used Terraform and aren’t comfortable with it, you can use the Azure UI or another automated method to create an AKS cluster under the AKS service.

Once you have your AKS environment up and running, ensure that you connect to it with the following command which brings down your Kubeconfig locally so you can access the AKS cluster on your terminal.

az aks get-credentials -n name_of_k8s_cluster -g resource_group_name

Setting Up Grafana and Prometheus

Now that you have a cluster up and running, it’s time to install Grafana and Prometheus. There are several ways to do this, which includes anything from installing Grafana and Prometheus separately and/or on separate servers to the Prometheus-Operator for Kubernetes.

In this case, you’ll use a popular method which is the prometheus-community Helm Chart.

If you don’t have Helm, you can install it here: https://helm.sh/docs/intro/install/

💡 I have a live training course on Helm if you want to check it out: https://www.oreilly.com/live-events/helm-charts-with-kubernetes/0636920074683/0636920074682/

First, add the Helm repo for Prometheus and Grafana.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

Next, ensure that the repo is up to date.

helm repo update

The last step is to install the Helm Chart in a new namespace called monitoring.

helm install prometheus \
  prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace

You should see an output similar to the screenshot below.

You can check that the stack was deployed by running the following command.

kubectl get all -n monitoring

You should see the following output.

NAME                                                         READY   STATUS    RESTARTS      AGE
pod/alertmanager-prometheus-kube-prometheus-alertmanager-0   2/2     Running   1 (39s ago)   71s
pod/prometheus-grafana-5fcf8745df-lnnn7                      3/3     Running   0             98s
pod/prometheus-kube-prometheus-operator-78fdf6678c-xwxm9     1/1     Running   0             98s
pod/prometheus-kube-state-metrics-65997fd766-tbkdm           1/1     Running   0             98s
pod/prometheus-prometheus-kube-prometheus-prometheus-0       2/2     Running   0             70s
pod/prometheus-prometheus-node-exporter-tbmm6                1/1     Running   0             98s

NAME                                              TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                      AGE
service/alertmanager-operated                     ClusterIP   None           <none>        9093/TCP,9094/TCP,9094/UDP   71s
service/prometheus-grafana                        ClusterIP   10.0.44.237    <none>        80/TCP                       98s
service/prometheus-kube-prometheus-alertmanager   ClusterIP   10.0.242.19    <none>        9093/TCP                     98s
service/prometheus-kube-prometheus-operator       ClusterIP   10.0.84.154    <none>        443/TCP                      98s
service/prometheus-kube-prometheus-prometheus     ClusterIP   10.0.237.68    <none>        9090/TCP                     98s
service/prometheus-kube-state-metrics             ClusterIP   10.0.120.173   <none>        8080/TCP                     98s
service/prometheus-operated                       ClusterIP   None           <none>        9090/TCP                     70s
service/prometheus-prometheus-node-exporter       ClusterIP   10.0.182.255   <none>        9100/TCP                     98s

NAME                                                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/prometheus-prometheus-node-exporter   1         1         1       1            1           <none>          98s

NAME                                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/prometheus-grafana                    1/1     1            1           98s
deployment.apps/prometheus-kube-prometheus-operator   1/1     1            1           98s
deployment.apps/prometheus-kube-state-metrics         1/1     1            1           98s

NAME                                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/prometheus-grafana-5fcf8745df                    1         1         1       98s
replicaset.apps/prometheus-kube-prometheus-operator-78fdf6678c   1         1         1       98s
replicaset.apps/prometheus-kube-state-metrics-65997fd766         1         1         1       98s

NAME                                                                    READY   AGE
statefulset.apps/alertmanager-prometheus-kube-prometheus-alertmanager   1/1     71s
statefulset.apps/prometheus-prometheus-kube-prometheus-prometheus       1/1     70s

Logging Into Grafana and Prometheus

One of the things that the Helm Chart does is create a Kubernetes Service for Grafana and Prometheus. To access them, you’ll need to run the following commands in two separate terminals.

First, expose Grafana.

kubectl port-forward svc/prometheus-grafana -n monitoring 4000:80

The default username/password for Grafana is admin/prom-operator.

Next, expose Prometheus.

kubectl port-forward svc/prometheus-kube-prometheus-prometheus -n monitoring 4001:9090

Once you can reach both UIs, go to the Grafana dashboard by clicking on the icon that looks like four squares.

Next, click on the dashboard for Pods.

Change the namespace to kube-system and you’ll see all of the Pods running in the namespace.

Deploying an App

When you looked at Grafana above, you could see Pods that were running out of the box were getting ingested, but what about Pods that you may deploy in the future? Let’s check out how to confirm that this works.

Deploy the following Kubernetes Manifest, which is going to deploy a stateless Nginx web app.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginxdeployment
  replicas: 2
  template:
    metadata:
      labels:
        app: nginxdeployment
    spec:
      containers:
      - name: nginxdeployment
        image: nginx:latest
        ports:
        - containerPort: 80

Next, go back to the same Pods dashboard that you were at in the previous section, but now change the namespace to default.

After a few seconds, you’ll see the two Nginx Pods show up.

Congrats! You have successfully set up Prometheus and Grafana. You also learned how to confirm that app ingestion was working via the Prometheus metrics by confirming Pods were running.