DEV Community 👩‍💻👨‍💻

Roy Ra for AWS Community Builders

Posted on

Monitoring EKS cluster with Prometheus and Grafana

In this post, I will go through the minimum steps to configure Prometheus and Grafana in the existing EKS cluster, which is most likely used for monitoring and alerts.

Configuring Node Groups

Creating new node group

Assuming that there are no pre-existing node groups, let's create one.

First, let's attach IAM role to Kubernetes service account using eksctl.

eksctl create iamserviceaccount \
  --name aws-node \
  --namespace kube-system \
  --cluster $EKS_CLUSTER_NAME \
  --role-name "AmazonEKSVPCCNIRole" \
  --attach-policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy \
  --override-existing-serviceaccounts \
Enter fullscreen mode Exit fullscreen mode

Now let's head over to AWS Management Console -> EKS -> Your cluster -> Compute -> Add node group.

We HAVE to use EC2 for Prometheus and Grafana, since they will both need volumes mounted to them.

When creating node group, we have to attach an IAM role to EC2 worker nodes. For easy demonstration, I created a new IAM role and attached policies as below.

New IAM role

When filling all the information required to create a node group, just make sure to apply values as below.

  • EC2 instance type: m5.large
  • Subnets: Your private subnets within the VPC that EKS cluster exists.

You can leave all the other values as it is.

To confirm that your EC2 worker nodes are properly running, run the command below. The result should say that 2 pods are running.

$ kubectl delete pods -n kube-system -l k8s-app=aws-node
aws-node-6szct   1/1     Running   0          4s
aws-node-nz5cd   1/1     Running   0          8s
Enter fullscreen mode Exit fullscreen mode


Prometheus and Grafana needs persistent storage attached to them, which is also called PV(Persistent Volume) in terms of Kubernetes.

For stateful workloads to use Amazon EBS volumes as PV, we have to add aws-ebs-csi-driver into the cluster.

Associating IAM role to Service account

Before we add aws-ebs-csi-driver, we first need to create an IAM role, and associate it with Kubernetes service account.

Let's use an example policy file, which you can download using the command below.

curl -sSL -o ebs-csi-policy.json
Enter fullscreen mode Exit fullscreen mode

Now let's create a new IAM policy with that file.

aws iam create-policy \
--region $AWS_REGION \
--policy-name $EBS_CSI_POLICY_NAME \
--policy-document file://ebs-csi-policy.json
export EBS_CSI_POLICY_ARN=$(aws --region ap-northeast-2 iam list-policies --query 'Policies[?PolicyName==`'$EBS_CSI_POLICY_NAME'`].Arn' --output text)
# arn:aws:iam::12345678:policy/AmazonEBSCSIPolicy
Enter fullscreen mode Exit fullscreen mode

After that, let's attach the new policy to Kubernetes service account.

eksctl create iamserviceaccount \
  --cluster $EKS_CLUSTER_NAME \
  --name ebs-csi-controller-irsa \
  --namespace kube-system \
  --attach-policy-arn $EBS_CSI_POLICY_ARN \
  --override-existing-serviceaccounts --approve
Enter fullscreen mode Exit fullscreen mode

And now, we're ready to install aws-ebs-csi-driver!

Installing aws-ebs-csi-driver

Assuming that helm is installed, let's add new helm repository as below.

helm repo add aws-ebs-csi-driver
helm repo update
Enter fullscreen mode Exit fullscreen mode

After adding new helm repository, let's install aws-ebs-csi-driver with below command using helm.

helm upgrade --install aws-ebs-csi-driver \
  --version=1.2.4 \
  --namespace kube-system \
  --set serviceAccount.controller.create=false \
  --set serviceAccount.snapshot.create=false \
  --set enableVolumeScheduling=true \
  --set enableVolumeResizing=true \
  --set enableVolumeSnapshot=true \
  --set \
  --set \
Enter fullscreen mode Exit fullscreen mode

Configuring Prometheus

First, let's create a new Kubernetes namespace where all Prometheus-related resources will reside in.

kubectl create ns prometheus
Enter fullscreen mode Exit fullscreen mode

We will install Prometheus using helm, so let's add the Prometheus repository to helm and install it running the command below.

# Add helm repositories for Prometheus
helm repo add kube-state-metrics
helm repo add prometheus-community

# Install Prometheus
helm upgrade -i prometheus prometheus-community/prometheus \
  --namespace prometheus \
  --set alertmanager.persistentVolume.storageClass="gp2",server.persistentVolume.storageClass="gp2"
Enter fullscreen mode Exit fullscreen mode

Few minutes after installing Prometheus, we can see all the resources created as below.

Prometheus resources

To test if they are working, let's use kubectl.

kubectl port-forward -n prometheus deploy/prometheus-server 8081:9090 &
Enter fullscreen mode Exit fullscreen mode

Now let's launch web browser, and go to localhost:8081.

The below web page should be seen if Prometheus has been successfully installed.

Prometheus UI

Configuring Grafana

As the same as Prometheus, we will create a dedicated Kubernetes namespace for Grafana, and install it using helm.

First let's create a new namespace.

kubectl create ns grafana
Enter fullscreen mode Exit fullscreen mode

We also need a manifest file, which will be used to configure Grafana. Below is an example of the file, called grafana.yaml.

# grafana.yaml
    apiVersion: 1
      - name: Prometheus
        type: prometheus
        url: http://prometheus-server.prometheus.svc.cluster.local
        access: proxy
        isDefault: true
Enter fullscreen mode Exit fullscreen mode

Now let's install Grafana using helm.

# Add helm repository for Grafana
helm repo add grafana

# Install Grafana
helm install grafana grafana/grafana \
    --namespace grafana \
    --set persistence.storageClass="gp2" \
    --set persistence.enabled=true \
    --set adminPassword='YOUR_PASSWORD' \
    --values grafana.yaml \
    --set service.type=NodePort
Enter fullscreen mode Exit fullscreen mode

Before exposing Grafana to the world, let's see how the Kubernetes service running Grafana is defined.

Grafana service

We can see that the target port is 3000, which is the port used by pods running Grafana.

So we have to attach a new Security group to EC2 worker nodes, allowing inbound requests for port 3000.

We can do this by creating a new Security group, and attaching it to EC2 worker nodes in the EC2 console of AWS Management Console.

After applying new security group to EC2 worker nodes, let's define a new Kubernetes ingress, which will provision an ALB.

To make Kubernetes ingress to create an ALB, we have to install aws-load-balancer-controller first.

Let's see how ingress looks like.

kind: Ingress
  name: grafana-ingress
  namespace: grafana
  annotations: grafana-alb internet-facing ip ${PUBLIC_SUBNET_IDs} '[{"HTTPS": 443}]' ${ACM_CERT_ARN} ${ALB_SECURITY_GROUP_ID} "3000" /api/health
  ingressClassName: alb
    - host: ${YOUR_ROUTE53_DOMAIN}
          - path: /
            pathType: Prefix
                name: grafana
                  number: 80
Enter fullscreen mode Exit fullscreen mode

After applying new ingress and having new ALB ready, we can head over to ${YOUR_ROUTE53_DOMAIN} and see that Grafana is ready as below.

Grafana UI

That's it! In this post, we created a new node group which will be used by Prometheus and Grafana, and installed and configured both Prometheus and Grafana.

Hope this post helps you!

After importing Grafana dashboard using 3119, I got a new dashboard as below.

Grafana UI2

Top comments (0)

Take a look at this:


Go to your customization settings to nudge your home feed to show content more relevant to your developer experience level. 🛠