In this post, I've compiled a list of issues along with their corresponding solutions that I encountered while configuring Prometheus and Grafana with Helm in the existing EKS Fargate cluster setup.
Error 1: Could not determine region from any metadata service. The region can be manually supplied via the AWS_REGION environment variable.
panic: did not find aws instance ID in node providerID string
$ k logs ebs-csi-controller-7f5c959c75-j92jf -n kube-system -c ebs-plugin
I1228 04:31:45.536047 1 driver.go:78] "Driver Information" Driver="ebs.csi.aws.com" Version="v1.25.0"
I1228 04:31:45.536144 1 metadata.go:85] "retrieving instance data from ec2 metadata"
I1228 04:31:58.152468 1 metadata.go:88] "ec2 metadata is not available"
I1228 04:31:58.152491 1 metadata.go:96] "retrieving instance data from kubernetes api"
I1228 04:31:58.153081 1 metadata.go:101] "kubernetes api is available"
E1228 04:31:58.175387 1 controller.go:86] "Could not determine region from any metadata service. The region can be manually supplied via the AWS_REGION environment variable." err="did not find aws instance ID in node providerID string"
panic: did not find aws instance ID in node providerID string
$ kubectl get pod -n kube-system -l "app.kubernetes.io/name=aws-ebs-csi-driver,app.kubernetes.io/instance=aws-ebs-csi-driver"
NAME READY STATUS RESTARTS AGE
ebs-csi-controller-7f5c959c75-j92jf 0/6 CrashLoopBackOff 36 (9s ago) 10m
ebs-csi-controller-7f5c959c75-xpv9x 0/6 CrashLoopBackOff 36 (23s ago) 10m
ebs-csi-node-969qs 3/3 Running 0 10m
Solution:
If you don't specify the region of your cluster when installing aws-ebs-csi-driver will result in the ebs-csi-controller pods crashing, as the default region will be set to 'us-east-1'.
helm upgrade --install aws-ebs-csi-driver \
--namespace kube-system \
--set controller.region=eu-north-1 \
--set controller.serviceAccount.create=false \
--set controller.serviceAccount.name=ebs-csi-controller-sa \
aws-ebs-csi-driver/aws-ebs-csi-driver
This is because of the ebs-plugin container "Could not determine region from any metadata service. The region can be manually supplied via the AWS_REGION environment variable."
Error 2: Values don't meet the specifications of the schema(s) in the following chart(s)
Error: values don't meet the specifications of the schema(s) in the following chart(s):
prometheus:
- server.remoteRead: Invalid type. Expected: array, given: object
alertmanager:
- extraEnv: Invalid type. Expected: array, given: object
Solution:
The errors are a result of a version mismatch between the Prometheus version and the file used in the Helm installation. If you are using a customized prometheus_values.yml file, ensure you specify the precise version of Prometheus. Alternatively, if you do not use a customized file, make sure to use the latest version of the Prometheus file.
helm upgrade -i prometheus prometheus-community/prometheus \
--namespace prometheus \
--set alertmanager.persistentVolume.storageClass="gp2",server.persistentVolume.storageClass="gp2" \
--version 15
I have used Prometheus version 15 here.
Error 3: 0/17 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/17 nodes are available: 17 Preemption is not helpful for scheduling..
$ k get events -n prometheus
LAST SEEN TYPE REASON OBJECT MESSAGE
2m13s Warning FailedScheduling pod/prometheus-alertmanager-c7644896-td8xv 0/17 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/17 nodes are available: 17 Preemption is not helpful for scheduling..
47m Normal SuccessfulCreate replicaset/prometheus-alertmanager-c7644896 Created pod: prometheus-alertmanager-c7644896-td8xv
2m30s Warning ProvisioningFailed persistentvolumeclaim/prometheus-alertmanager storageclass.storage.k8s.io "prometheus" not found
The pods, prometheus-alertmanager and prometheus-server, will remain in a pending status.
$ k get po -n prometheus
NAME READY STATUS RESTARTS AGE
prometheus-alertmanager-c7644896-q2nzm 0/2 Pending 0 74s
prometheus-kube-state-metrics-8476bdcc64-f984p 1/1 Running 0 75s
prometheus-node-exporter-r82k7 1/1 Running 0 74s
prometheus-pushgateway-665779d98f-zh2pf 1/1 Running 0 75s
prometheus-server-6fd8bc8576-csqt8 0/2 Pending 0 75s
Solution:
This is due to missing storage class of 'prometheus' as clearly shown in the events logs. So go ahead and create the storage class shown below.
EBS_AZ=$(kubectl get nodes \
-o=jsonpath="{.items[0].metadata.labels['topology\.kubernetes\.io\/zone']}")
echo "
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: prometheus
namespace: prometheus
provisioner: ebs.csi.aws.com
parameters:
type: gp2
reclaimPolicy: Retain
allowedTopologies:
- matchLabelExpressions:
- key: topology.ebs.csi.aws.com/zone
values:
- $EBS_AZ
" | kubectl apply -f -
Error 4: Failed to provision volume with StorageClass "prometheus": rpc error: code = Internal desc = Could not create volume "pvc-48b7c3d8-d46a-47be-90e7-3d59eb3f5844": could not create volume in EC2: NoCredentialProviders: no valid providers in chain...
$ kubectl get events --sort-by=.metadata.creationTimestamp -n prometheus
LAST SEEN TYPE REASON OBJECT MESSAGE
30s Normal Provisioning persistentvolumeclaim/prometheus-alertmanager External provisioner is provisioning volume for claim "prometheus/prometheus-alertmanager"
30s Normal Provisioning persistentvolumeclaim/prometheus-server External provisioner is provisioning volume for claim "prometheus/prometheus-server"
5s Warning ProvisioningFailed persistentvolumeclaim/prometheus-alertmanager failed to provision volume with StorageClass "prometheus": rpc error: code = Internal desc = Could not create volume "pvc-b7373f3b-3da9-47ac-8bfb-ad396816ce88": could not create volume in EC2: NoCredentialProviders: no valid providers in chain...
17s Warning ProvisioningFailed persistentvolumeclaim/prometheus-server failed to provision volume with StorageClass "prometheus": rpc error: code = Internal desc = Could not create volume "pvc-48b7c3d8-d46a-47be-90e7-3d59eb3f5844": could not create volume in EC2: NoCredentialProviders: no valid providers in chain...
Solution:
This issue arises from insufficient permissions assigned to the service account in the cluster, preventing it from provisioning the required persistent volumes.
You need to set service account details (with required IAM policies for the role) while install aws-ebs-csi-driver with Helm as shown here.
helm upgrade --install aws-ebs-csi-driver \
--namespace kube-system \
--set controller.region=eu-north-1 \
--set controller.serviceAccount.create=false \
--set controller.serviceAccount.name=ebs-csi-controller-sa \
aws-ebs-csi-driver/aws-ebs-csi-driver
Error 5: The service account is absent in the EKS cluster setup, yet it is visible through the eksctl get
command.
Solution:
Check whether you have added '--role-only' option while creating service account using eksctl.
If yes, delete the service account and recreate it without '--role-only' option as shown below.
eksctl create iamserviceaccount \
--name ebs-csi-controller-sa \
--namespace kube-system \
--cluster api-dev \
--role-name AmazonEKS_EBS_CSI_DriverRole \
--attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
--override-existing-serviceaccounts --approve
Here, 'api-dev' is the cluster name. Replace it with your cluster name before running the command.
Thank you for taking the time to read 👏😊! I will continue to update this post as I encounter new issues. Feel free to mention any unlisted issues in the comment section. 🤝❤️
Check my post on setting up Prometheus and Grafana with existing EKS Fargate cluster - Monitoring
Top comments (0)