DEV Community

Cover image for Setup Prometheus and Grafana with existing EKS Fargate cluster - Monitoring
Nowsath for AWS Community Builders

Posted on • Updated on

Setup Prometheus and Grafana with existing EKS Fargate cluster - Monitoring

In this article, I will delineate the fundamental steps for configuring Prometheus and Grafana within the existing EKS Fargate cluster, along with the establishment of custom metrics. These measures are commonly utilized for monitoring and alerting purposes.

Steps to follow.

  1. Configure Node Groups
  2. Install AWS EBS CSI driver
  3. Install Prometheus
  4. Install Grafana

1. Configure Node Groups

Given the absence of any pre-existing node groups, let's proceed to create a new one.

i. Create an IAM Role for EC2 worker nodes

Go to the AWS IAM console, create a role 'AmazonEKSWorkerNodeRole' with following 3 AWS managed policies.

  • AmazonEC2ContainerRegistryReadOnly
  • AmazonEKS_CNI_Policy
  • AmazonEKSWorkerNodePolicy

AWS IAM console

ii. Create Node groups

Navigate to the AWS EKS console, add node group to the cluster. It is imperative to utilize EC2 instances for Prometheus and Grafana, as both applications require volumes to be mounted on them.

When configuring the node group for the cluster, take the following into consideration,

Node IAM role: Select created role in the previous step (AmazonEKSWorkerNodeRole)
Instance type: Select based on your requirements (t3.small in my case)
Subnets: The private subnets within the VPC where the EKS cluster is located. If you want to enable remote access to node then you need to use public subnets.

In my configuration, I opted for a t3.small instance with a desired size of 1. The setup worked without any issues.

To verify the proper functioning of the EC2 worker nodes, execute the following command. The output should indicate that a pod is currently running.

$ k get po -l k8s-app=aws-node -n kube-system
aws-node-hbvz2   1/1     Running   0          58m
Enter fullscreen mode Exit fullscreen mode

2. Install AWS EBS CSI driver

The Amazon EBS CSI driver manages the lifecycle of Amazon EBS volumes as storage for the Kubernetes Volumes that you create.

The Amazon EBS CSI driver makes Amazon EBS volumes for these types of Kubernetes volumes: generic ephemeral volumes and persistent volumes.

Prometheus and Grafana require persistent storage, commonly referred to as PV (Persistent Volume) in Kubernetes terminology, to be attached to them.

i. Create AWS EBS CSI driver IAM role and associate to service account

Create a service account named 'ebs-csi-controller-sa' and associate it with AWS managed IAM policies. This service account will be utilized during the installation of the AWS EBS CSI driver.

Replace 'my-cluster' with the name of your cluster.

eksctl create iamserviceaccount \
  --name ebs-csi-controller-sa \
  --namespace kube-system \
  --cluster my-cluster \
  --role-name AmazonEKS_EBS_CSI_DriverRole \
  --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
  --override-existing-serviceaccounts --approve
Enter fullscreen mode Exit fullscreen mode

If you don't specify the role name it will assign a name automatically.

ii. Add Helm repositories

We will use Helm to install the components required to run Prometheus and Grafana.

helm repo add aws-ebs-csi-driver
helm repo add kube-state-metrics
helm repo add prometheus-community
helm repo add grafana
helm repo update
Enter fullscreen mode Exit fullscreen mode

ii. Installing aws-ebs-csi-driver

Following the addition of the new Helm repository, proceed to install the AWS EBS CSI driver using the below helm command.

Replace the region 'eu-north-1' with your cluster region.

helm upgrade --install aws-ebs-csi-driver \
  --namespace kube-system \
  --set controller.region=eu-north-1 \
  --set controller.serviceAccount.create=false \
  --set \
Enter fullscreen mode Exit fullscreen mode

3. Install Prometheus

For persistent storage of scraped metrics and configurations, Prometheus leverages two EBS volumes: one dedicated to the prometheus-server pod and another for the prometheus-alertmanager pod.

i. Create a namespace for Prometheus

Create a namespace called 'prometheus'.
kubectl create namespace prometheus

ii. Set availability Zone and create storage class

There are two options for storage class:

  • Create a storage class in your worker node's Availability Zone (AZ).

  • Use the default storage class (proceed to step iii).

Get the Availability Zone of one of the worker nodes:

EBS_AZ=$(kubectl get nodes \
Enter fullscreen mode Exit fullscreen mode

Create a storage class:

echo "
kind: StorageClass
  name: prometheus
  namespace: prometheus
  type: gp2
reclaimPolicy: Retain
- matchLabelExpressions:
  - key:
    - $EBS_AZ
" | kubectl apply -f -
Enter fullscreen mode Exit fullscreen mode

iii. Installing Prometheus

First download the Helm values for Prometheus file:

Enter fullscreen mode Exit fullscreen mode

Moreover, if you wish to configure custom metrics endpoints, include those details under the 'extraScrapeConfigs:|' section in the prometheus_values.yml file, as demonstrated here.

extraScrapeConfigs: |
  - job_name: 'api-svc'
    metrics_path: /metrics
    scheme: http
      - targets: ['api-svc.api-dev.svc.cluster.local:5557']
  - job_name: 'apps-svc'
    metrics_path: /metrics
    scheme: http
      - targets: ['apps-svc.api-dev.svc.cluster.local:5559']
Enter fullscreen mode Exit fullscreen mode

Run Helm command to install Prometheus (for worker node's AZ storage class option):

helm upgrade -i prometheus -f prometheus_values.yml prometheus-community/prometheus \
  --namespace prometheus --version 15
Enter fullscreen mode Exit fullscreen mode

Run Helm command to install Prometheus (for default storage class option):

helm upgrade -i prometheus -f prometheus_values.yml prometheus-community/prometheus \
  --namespace prometheus \
  --set alertmanager.persistentVolume.storageClass="gp2",server.persistentVolume.storageClass="gp2" \
  --version 15
Enter fullscreen mode Exit fullscreen mode

Important Note: The default storage class has a reclaim policy set to "Delete" Consequently, any EBS volumes used by Prometheus will be automatically deleted when you remove Prometheus itself.

Once Helm installation is completed, let's verify the resources.

$ k get all -n prometheus
NAME                                                 READY   STATUS    RESTARTS   AGE
pod/prometheus-alertmanager-c7644896-7kfjq           2/2     Running   0          103m
pod/prometheus-kube-state-metrics-8476bdcc64-wng4m   1/1     Running   0          103m
pod/prometheus-node-exporter-8hf57                   1/1     Running   0          103m
pod/prometheus-pushgateway-665779d98f-v8q5d          1/1     Running   0          103m
pod/prometheus-server-6fd8bc8576-wwmvw               2/2     Running   0          103m

NAME                                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
service/prometheus-alertmanager         ClusterIP     <none>        80/TCP         103m
service/prometheus-kube-state-metrics   ClusterIP   <none>        8080/TCP       103m
service/prometheus-node-exporter        ClusterIP   None             <none>        9100/TCP       103m
service/prometheus-pushgateway          ClusterIP    <none>        9091/TCP       103m
service/prometheus-server               NodePort    <none>        80:30900/TCP   103m

NAME                                      DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/prometheus-node-exporter   1         1         1       1            1           <none>          103m

NAME                                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/prometheus-alertmanager         1/1     1            1           103m
deployment.apps/prometheus-kube-state-metrics   1/1     1            1           103m
deployment.apps/prometheus-pushgateway          1/1     1            1           103m
deployment.apps/prometheus-server               1/1     1            1           103m

NAME                                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/prometheus-alertmanager-c7644896           1         1         1       103m
replicaset.apps/prometheus-kube-state-metrics-8476bdcc64   1         1         1       103m
replicaset.apps/prometheus-pushgateway-665779d98f          1         1         1       103m
replicaset.apps/prometheus-server-6fd8bc8576               1         1         1       103m

Enter fullscreen mode Exit fullscreen mode

The chart creates two persistent volume claims: an 8Gi volume for prometheus-server pod and a 2Gi volume for prometheus-alertmanager.

iv. Check metrics from Prometheus

To inspect metrics from Prometheus in the browser, you must initiate port forwarding.

kubectl port-forward -n prometheus deploy/prometheus-server 8081:9090 &
Enter fullscreen mode Exit fullscreen mode

Now, open a web browser and navigate to http://localhost:8081/targets

From the page you can see all configured metrics, alerts, rules & other configurations.

Prometheus dashboard

4. Install Grafana

In this step, we will create a dedicated Kubernetes namespace for Grafana, create the Grafana manifest file, setup security groups, setup an Ingress, and finally configure the dashboard.

i. Create a namespace for Grafana

Create a namespace called 'grafana'.
kubectl create namespace grafana

ii. Create a Grafana mainfest file

We also require a manifest file to configure Grafana. Below is an example of the file named grafana.yaml.

# grafana.yaml
    apiVersion: 1
      - name: Prometheus
        type: prometheus
        url: http://prometheus-server.prometheus.svc.cluster.local
        access: proxy
        isDefault: true
Enter fullscreen mode Exit fullscreen mode

iii. Installing Grafana

Now, proceed to install Grafana using Helm. Replace 'my-password' with your password.

helm install grafana grafana/grafana \
    --namespace grafana \
    --set persistence.storageClass='gp2' \
    --set persistence.enabled=true \
    --set adminPassword='my-passoword' \
    --values grafana.yaml \
    --set service.type=NodePort
Enter fullscreen mode Exit fullscreen mode

iv. Create a security group

Create a security group (grafana-alb-sg) with allowing https from anywhere as an inbound rule for ingress ALB.

v. Allow inbound request to EC2 worker node security group

Before exposing Grafana to the external world, let's examine the definition of the Kubernetes service responsible for running Grafana.

$ k -n grafana get svc grafana -o yaml
apiVersion: v1
kind: Service
  annotations: grafana grafana
  creationTimestamp: "2023-12-29T05:59:40Z"
  labels: grafana Helm grafana 10.2.2 grafana-7.0.14
  name: grafana
  namespace: grafana
  resourceVersion: "179748053"
  uid: 7da370e2-63a4-4ca6-8ad4-14e624a51c4f
  externalTrafficPolicy: Cluster
  internalTrafficPolicy: Cluster
  - IPv4
  ipFamilyPolicy: SingleStack
  - name: service
    nodePort: 31059
    port: 80
    protocol: TCP
    targetPort: 3000
  selector: grafana grafana
  sessionAffinity: None
  type: NodePort
  loadBalancer: {}

Enter fullscreen mode Exit fullscreen mode

The target port is set to 3000, which corresponds to the port utilized by pods running Grafana.

To enable inbound requests for port 3000, it is necessary to associate the security group created in the previous step with the EC2 worker nodes.

vi. Setup Ingress

Define a new Kubernetes Ingress to facilitate the provisioning of an ALB.

Additionally, assume that you have already installed the AWS Load Balancer Controller to enable Kubernetes Ingress in creating an ALB.

Let's define the Ingress definition file for Grafana.

kind: Ingress
  name: grafana-ingress
  namespace: grafana
  annotations: alb grafana-alb internet-facing ip ${PUBLIC_SUBNET_IDs} '[{"HTTPS":443}, {"HTTP":80}]' ${ALB_SECURITY_GROUP_ID} "3000" /api/health ${ACM_CERT_ARN}
    - host: ${YOUR_ROUTE53_DOMAIN}
          - path: /
            pathType: Prefix
                name: grafana
                  number: 80
Enter fullscreen mode Exit fullscreen mode

Replace values for subnets,secruity-groups,certificate-arn, host with your values.

Post applying the new Ingress and once the new ALB is ready, navigate to ${YOUR_ROUTE53_DOMAIN} to confirm that Grafana is now accessible.

After logging into your Grafana account, proceed to import the necessary dashboards.

Grafana import dashboard

You can download dashboards from Grafana Dashboards

I utilized these two dashboards, which proved to be valuable for monitoring the overall EKS Fargate cluster.

That concludes our walkthrough! In this guide, we established a new node group essential for Prometheus and Grafana, and successfully installed and configured both tools.

I trust this post proves valuable to you! 😊

I've authored another post detailing additional issues and their solutions encountered during the setup of Prometheus & Grafana.

You can find the post here.

Additional references:

Top comments (1)

nowsathk profile image

Hi Readers,
I've revamped the Prometheus installation section in this article, offering improved methods compared to the previous methods.