In this article, I will delineate the fundamental steps for configuring Prometheus and Grafana within the existing EKS Fargate cluster, along with the establishment of custom metrics. These measures are commonly utilized for monitoring and alerting purposes.
Steps to follow.
- Configure Node Groups
- Install AWS EBS CSI driver
- Install Prometheus
- Install Grafana
1. Configure Node Groups
Given the absence of any pre-existing node groups, let's proceed to create a new one.
i. Create an IAM Role for EC2 worker nodes
Go to the AWS IAM console, create a role 'AmazonEKSWorkerNodeRole' with following 3 AWS managed policies.
- AmazonEC2ContainerRegistryReadOnly
- AmazonEKS_CNI_Policy
- AmazonEKSWorkerNodePolicy
ii. Create Node groups
Navigate to the AWS EKS console, add node group to the cluster. It is imperative to utilize EC2 instances for Prometheus and Grafana, as both applications require volumes to be mounted on them.
When configuring the node group for the cluster, take the following into consideration,
Node IAM role: Select created role in the previous step (AmazonEKSWorkerNodeRole)
Instance type: Select based on your requirements (t3.small in my case)
Subnets: The private subnets within the VPC where the EKS cluster is located. If you want to enable remote access to node then you need to use public subnets.
In my configuration, I opted for a t3.small instance with a desired size of 1. The setup worked without any issues.
To verify the proper functioning of the EC2 worker nodes, execute the following command. The output should indicate that a pod is currently running.
$ k get po -l k8s-app=aws-node -n kube-system
NAME READY STATUS RESTARTS AGE
aws-node-hbvz2 1/1 Running 0 58m
2. Install AWS EBS CSI driver
The Amazon EBS CSI driver manages the lifecycle of Amazon EBS volumes as storage for the Kubernetes Volumes that you create.
The Amazon EBS CSI driver makes Amazon EBS volumes for these types of Kubernetes volumes: generic ephemeral volumes and persistent volumes.
Prometheus and Grafana require persistent storage, commonly referred to as PV (Persistent Volume) in Kubernetes terminology, to be attached to them.
i. Create AWS EBS CSI driver IAM role and associate to service account
Create a service account named 'ebs-csi-controller-sa' and associate it with AWS managed IAM policies. This service account will be utilized during the installation of the AWS EBS CSI driver.
Replace 'my-cluster' with the name of your cluster.
eksctl create iamserviceaccount \
--name ebs-csi-controller-sa \
--namespace kube-system \
--cluster my-cluster \
--role-name AmazonEKS_EBS_CSI_DriverRole \
--attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
--override-existing-serviceaccounts --approve
If you don't specify the role name it will assign a name automatically.
ii. Add Helm repositories
We will use Helm to install the components required to run Prometheus and Grafana.
helm repo add aws-ebs-csi-driver https://kubernetes-sigs.github.io/aws-ebs-csi-driver
helm repo add kube-state-metrics https://kubernetes.github.io/kube-state-metrics
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
ii. Installing aws-ebs-csi-driver
Following the addition of the new Helm repository, proceed to install the AWS EBS CSI driver using the below helm command.
Replace the region 'eu-north-1' with your cluster region.
helm upgrade --install aws-ebs-csi-driver \
--namespace kube-system \
--set controller.region=eu-north-1 \
--set controller.serviceAccount.create=false \
--set controller.serviceAccount.name=ebs-csi-controller-sa \
aws-ebs-csi-driver/aws-ebs-csi-driver
3. Install Prometheus
For persistent storage of scraped metrics and configurations, Prometheus leverages two EBS volumes: one dedicated to the prometheus-server pod and another for the prometheus-alertmanager pod.
i. Create a namespace for Prometheus
Create a namespace called 'prometheus'.
kubectl create namespace prometheus
ii. Set availability Zone and create storage class
There are two options for storage class:
Use the default storage class (proceed to step iii).
Create a storage class in your worker node's Availability Zone (AZ).
Get the Availability Zone of one of the worker nodes:
EBS_AZ=$(kubectl get nodes \
-o=jsonpath="{.items[0].metadata.labels['topology\.kubernetes\.io\/zone']}")
Create a storage class:
echo "
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: prometheus
namespace: prometheus
provisioner: ebs.csi.aws.com
parameters:
type: gp2
reclaimPolicy: Retain
allowedTopologies:
- matchLabelExpressions:
- key: topology.ebs.csi.aws.com/zone
values:
- $EBS_AZ
" | kubectl apply -f -
iii. Installing Prometheus
First download the Helm values for Prometheus file:
wget https://github.com/aws-samples/containers-blog-maelstrom/raw/main/fargate-monitoring/prometheus_values.yml
Moreover, if you wish to configure custom metrics endpoints, include those details under the 'extraScrapeConfigs:|' section in the prometheus_values.yml file, as demonstrated here.
extraScrapeConfigs: |
- job_name: 'api-svc'
metrics_path: /metrics
scheme: http
static_configs:
- targets: ['api-svc.api-dev.svc.cluster.local:5557']
- job_name: 'apps-svc'
metrics_path: /metrics
scheme: http
static_configs:
- targets: ['apps-svc.api-dev.svc.cluster.local:5559']
Run Helm command to install Prometheus (for worker node's AZ storage class option):
helm upgrade -i prometheus -f prometheus_values.yml prometheus-community/prometheus \
--namespace prometheus --version 15
Run Helm command to install Prometheus (for default storage class option):
helm upgrade -i prometheus -f prometheus_values.yml prometheus-community/prometheus \
--namespace prometheus \
--set alertmanager.persistentVolume.storageClass="gp2",server.persistentVolume.storageClass="gp2" \
--version 15
Important Note: The default storage class has a reclaim policy set to "Delete" Consequently, any EBS volumes used by Prometheus will be automatically deleted when you remove Prometheus itself.
Once Helm installation is completed, let's verify the resources.
$ k get all -n prometheus
NAME READY STATUS RESTARTS AGE
pod/prometheus-alertmanager-c7644896-7kfjq 2/2 Running 0 103m
pod/prometheus-kube-state-metrics-8476bdcc64-wng4m 1/1 Running 0 103m
pod/prometheus-node-exporter-8hf57 1/1 Running 0 103m
pod/prometheus-pushgateway-665779d98f-v8q5d 1/1 Running 0 103m
pod/prometheus-server-6fd8bc8576-wwmvw 2/2 Running 0 103m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/prometheus-alertmanager ClusterIP 172.20.84.40 <none> 80/TCP 103m
service/prometheus-kube-state-metrics ClusterIP 172.20.192.129 <none> 8080/TCP 103m
service/prometheus-node-exporter ClusterIP None <none> 9100/TCP 103m
service/prometheus-pushgateway ClusterIP 172.20.181.13 <none> 9091/TCP 103m
service/prometheus-server NodePort 172.20.167.19 <none> 80:30900/TCP 103m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/prometheus-node-exporter 1 1 1 1 1 <none> 103m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/prometheus-alertmanager 1/1 1 1 103m
deployment.apps/prometheus-kube-state-metrics 1/1 1 1 103m
deployment.apps/prometheus-pushgateway 1/1 1 1 103m
deployment.apps/prometheus-server 1/1 1 1 103m
NAME DESIRED CURRENT READY AGE
replicaset.apps/prometheus-alertmanager-c7644896 1 1 1 103m
replicaset.apps/prometheus-kube-state-metrics-8476bdcc64 1 1 1 103m
replicaset.apps/prometheus-pushgateway-665779d98f 1 1 1 103m
replicaset.apps/prometheus-server-6fd8bc8576 1 1 1 103m
The chart creates two persistent volume claims: an 8Gi volume for prometheus-server pod and a 2Gi volume for prometheus-alertmanager.
iv. Check metrics from Prometheus
To inspect metrics from Prometheus in the browser, you must initiate port forwarding.
kubectl port-forward -n prometheus deploy/prometheus-server 8081:9090 &
Now, open a web browser and navigate to http://localhost:8081/targets
From the page you can see all configured metrics, alerts, rules & other configurations.
4. Install Grafana
In this step, we will create a dedicated Kubernetes namespace for Grafana, create the Grafana manifest file, setup security groups, setup an Ingress, and finally configure the dashboard.
i. Create a namespace for Grafana
Create a namespace called 'grafana'.
kubectl create namespace grafana
ii. Create a Grafana mainfest file
We also require a manifest file to configure Grafana. Below is an example of the file named grafana.yaml.
# grafana.yaml
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://prometheus-server.prometheus.svc.cluster.local
access: proxy
isDefault: true
iii. Installing Grafana
Now, proceed to install Grafana using Helm. Replace 'my-password' with your password.
helm install grafana grafana/grafana \
--namespace grafana \
--set persistence.storageClass='gp2' \
--set persistence.enabled=true \
--set adminPassword='my-passoword' \
--values grafana.yaml \
--set service.type=NodePort
iv. Create a security group
Create a security group (grafana-alb-sg) with allowing https from anywhere as an inbound rule for ingress ALB.
v. Allow inbound request to EC2 worker node security group
Before exposing Grafana to the external world, let's examine the definition of the Kubernetes service responsible for running Grafana.
$ k -n grafana get svc grafana -o yaml
apiVersion: v1
kind: Service
metadata:
annotations:
meta.helm.sh/release-name: grafana
meta.helm.sh/release-namespace: grafana
creationTimestamp: "2023-12-29T05:59:40Z"
labels:
app.kubernetes.io/instance: grafana
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: grafana
app.kubernetes.io/version: 10.2.2
helm.sh/chart: grafana-7.0.14
name: grafana
namespace: grafana
resourceVersion: "179748053"
uid: 7da370e2-63a4-4ca6-8ad4-14e624a51c4f
spec:
clusterIP: 172.20.102.0
clusterIPs:
- 172.20.102.0
externalTrafficPolicy: Cluster
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- name: service
nodePort: 31059
port: 80
protocol: TCP
targetPort: 3000
selector:
app.kubernetes.io/instance: grafana
app.kubernetes.io/name: grafana
sessionAffinity: None
type: NodePort
status:
loadBalancer: {}
The target port is set to 3000, which corresponds to the port utilized by pods running Grafana.
To enable inbound requests for port 3000, it is necessary to associate the security group created in the previous step with the EC2 worker nodes.
vi. Setup Ingress
Define a new Kubernetes Ingress to facilitate the provisioning of an ALB.
Additionally, assume that you have already installed the AWS Load Balancer Controller to enable Kubernetes Ingress in creating an ALB.
Let's define the Ingress definition file for Grafana.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: grafana-ingress
namespace: grafana
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/load-balancer-name: grafana-alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/subnets: ${PUBLIC_SUBNET_IDs}
alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}, {"HTTP":80}]'
alb.ingress.kubernetes.io/security-groups: ${ALB_SECURITY_GROUP_ID}
alb.ingress.kubernetes.io/healthcheck-port: "3000"
alb.ingress.kubernetes.io/healthcheck-path: /api/health
alb.ingress.kubernetes.io/certificate-arn: ${ACM_CERT_ARN}
spec:
rules:
- host: ${YOUR_ROUTE53_DOMAIN}
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: grafana
port:
number: 80
Replace values for subnets,secruity-groups,certificate-arn, host with your values.
Post applying the new Ingress and once the new ALB is ready, navigate to ${YOUR_ROUTE53_DOMAIN} to confirm that Grafana is now accessible.
After logging into your Grafana account, proceed to import the necessary dashboards.
You can download dashboards from Grafana Dashboards
I utilized these two dashboards, which proved to be valuable for monitoring the overall EKS Fargate cluster.
- ID: 17119 Kubernetes EKS Cluster (Prometheus)
- ID: 12421 Fargate Pod Requests
That concludes our walkthrough! In this guide, we established a new node group essential for Prometheus and Grafana, and successfully installed and configured both tools.
I trust this post proves valuable to you! 😊
Troubleshooting:
I've authored another post detailing additional issues and their solutions encountered during the setup of Prometheus & Grafana.
You can find the post here.
Additional references:
Top comments (1)
Hi Readers,
I've revamped the Prometheus installation section in this article, offering improved methods compared to the previous methods.