Monitoring and Observability in Azure Kubernetes Service With Prometheus

#kubernetes #devops #docker #git

Regardless of where your Kubernetes cluster is, whether it’s in the cloud or on-prem, monitoring and observability are crucial to any production-level environment success. It’s the difference between troubleshooting an issue efficiently and waking up at 2:00 AM not knowing what’s going on.

In this blog post, you’ll learn how to implement Prometheus, a popular monitoring and alerting container tool, with Azure Kubernetes Service (AKS)

Prerequisites

To follow along with this blog post, you’ll need:

An Azure account. If you don’t have one, you can sign up here.
An AKS cluster. If you don’t have one created, you can use Terraform code found here to create it.

Monitoring and Observability

Before diving into the hands-on content, let’s talk about what monitoring and observability are. Although they are in the same pool of tools/platforms, they are drastically different.

Observability is typically around:

Metrics
Logs
Traces

You take all three and make decisions about your system(s) on things like what needs to be fixed/updated/revamped and what decisions you need to make to fix whatever is going on.

Monitoring is the act of you and your team having the ability to watch what’s happening in a system. For example, if CPU or memory spikes, you’ll know about it because you can see it on some graph.

The biggest difference between monitoring and observability is with monitoring, teams can watch what’s happening in a system. With observability, it’s easier to debug a system. For example, monitoring will tell you that CPU usage is high and observability will help you to find out why.

The Architecture

You have a few different options for installing Prometheus and Grafana:

Install Prometheus on a separate sever, like Ubuntu
Install Grafana on a separate server, like Ubuntu
Install Prometheus as a Kubernetes app on the AKS cluster
Install Grafana as a Kubernetes app on the AKS cluster

Both options are viable and one isn’t better than the other from a production perspective. If you want Prometheus and Grafana to monitor/observe anything other than the Kubernetes environment you’re deploying to, it may make sense to have it as a standalone environment (install on an Ubuntu server for example), but it’ll depend on what you’re looking for in terms of what you’re monitoring.

For the purposes of this blog post, you’ll install Prometheus and Grafana as a Kubernetes app via a Helm Chart

Several Kubernetes resources will be created.

First, the Pods for Prometheus and Grafana to run successfully. These Pods are what the Grafana dashboard runs on, the Prometheus dashboard runs on, and Pods to collect all of the data from the Kubernetes cluster.

Next, there are the services. Most of the services are internal and not public-facing as they’re used for Prometheus and Grafana to communicate with each other, and with the internal Kubernetes resources. One resource is public-facing, which is the portal to log into Grafana.

Following services, there’s one DaemonSet, which is used to ensure that the Pods are installed across all Kubernetes Worker Nodes.

There are three deployments; one for Grafana, one for Prometheus, and one for kube-state-metrics, which is a lightweight tool for monitoring the k8s cluster. It provides the state of Kubernetes resources/objects.

ReplicaSets are set up to ensure that the desired amount of Pods are running at any given time. In this case, it’s 1, but for production, you’d definitely want to bump that up.

Finally, there’s the stateful set for Prometheus to ensure that unique IDs are kept unique regardless of if any of the k8s objects/resources go down for:

alertmanager, which handles alerts sent by applications.
Prometheus Query Interface, which is the Prometheus environment that you’d use to query metrics and other data if Grafana wasn’t installed.

Prometheus Setup

For the purposes of this setup, you’ll use Prometheus for observability/monitoring and Grafana for monitoring/showing graphs. To make this as straightforward as possible, you’ll use a Kubernetes operator called prometheus-operator, which is a pretty popular implementation since it’s as simple as running a Helm Chart.

First, create a new namespace where Prometheus and Grafana will live

kubectl create namespace monitoring

Next, add the Helm Repo for prometheus-community and update it

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

After the Helm Repo is installed, install the actual Helm Chart on the AKS cluster.

Notice how there’s a flag called --set grafana.service.type="LoadBalancer". This is not on by default in the default installation. What this flag does is set a public-facing load balancer to Grafana. That way, you can log in to check out the graphs. Another way to do this is by setting up a port-forwarding rule for Kubernetes to forward traffic to your localhost, but that’s not the best option for production.

helm install monitoring prometheus-community/kube-prometheus-stack --set grafana.service.type="LoadBalancer" --namespace monitoring

Now that Prometheus and Grafana are installed, run the following command and copy/paste the public IP address from the monitoring-grafana Service into a web browser