DEV Community

Arseny Zinchenko
Arseny Zinchenko

Posted on • Originally published at rtfm.co.ua on

Prometheus: Kubernetes endpoints monitoring with blackbox-exporter

The blackbox-exporter is an exporter that can monitor various endpoints — URLs on the Internet, your LoadBalancers in AWS, or Services in a Kubernetes cluster, such as MySQL or PostgreSQL databases.

Blackbox Exporter can give you HTTP response time statistics, response codes, information on SSL certificates, etc.

What are we going to do in this post:

  • with the help of Helm, will deploy the kube-prometheus-stack in Minikube
  • deploy the Blackbox Exporter itself
  • configure monitoring of endpoints with the Kubernetes ServiceMonitors, which will be created through the blackbox-exporter config
  • will take a brief overview of Blacbkox’ probes which are used to poll endpoints

Let’s go.

Running the Kube Prometheus Stack

We will do this setup in the Minikube, where we will install Prometheus Operator from the Helm repository.

Launch the Minicube itself:

$ minikube start
Enter fullscreen mode Exit fullscreen mode

Add the Prometheus chart repository:

$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
Enter fullscreen mode Exit fullscreen mode

Create a namespace:

$ kubectl create ns monitoring
Enter fullscreen mode Exit fullscreen mode

Install the kube-prometheus-stack chart:

$ helm -n monitoring install prometheus prometheus-community/kube-prometheus-stack
Enter fullscreen mode Exit fullscreen mode

Wait a few minutes until all pods become Running:

$ kubectl -n monitoring get pod
NAME READY STATUS RESTARTS AGE
alertmanager-prometheus-kube-prometheus-alertmanager-0 1/2 Running 1 (25s ago) 44s
prometheus-grafana-599dbccb79-zlklx 2/3 Running 0 57s
prometheus-kube-prometheus-operator-689dd6679c-s66vp 1/1 Running 0 57s
prometheus-kube-state-metrics-6cfd96f4c8–84j26 1/1 Running 0 57s
prometheus-prometheus-kube-prometheus-prometheus-0 0/2 PodInitializing 0 44s
prometheus-prometheus-node-exporter-2h542 1/1 Running 0 57s
Enter fullscreen mode Exit fullscreen mode

Find the Prometheus Service:

$ kubectl -n monitoring get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 7s
prometheus-grafana ClusterIP 10.97.79.182 <none> 80/TCP 20s
prometheus-kube-prometheus-alertmanager ClusterIP 10.106.147.39 <none> 9093/TCP 20s
prometheus-kube-prometheus-operator ClusterIP 10.98.222.45 <none> 443/TCP 20s
prometheus-kube-prometheus-prometheus ClusterIP 10.107.26.113 <none> 9090/TCP 20s
…
Enter fullscreen mode Exit fullscreen mode

Open access to the Service by using the port-forward:

$ kubectl -n monitoring port-forward svc/prometheus-kube-prometheus-prometheus 9090:9090
Enter fullscreen mode Exit fullscreen mode

Open http://localhost:9090, and check if everything is working:

Running blackbox-exporter

Its chart present in the same repository, so just install the exporter:

helm -n monitoring upgrade — install prometheus-blackbox prometheus-community/prometheus-blackbox-exporter

Check the Pod:

$ kubectl -n monitoring get pod
NAME READY STATUS RESTARTS AGE
prometheus-blackbox-prometheus-blackbox-exporter-6865d9b44h546j 1/1 Running 0 27s
…
Enter fullscreen mode Exit fullscreen mode

Blackbox keeps its config in a ConfigMap, which connects to the Pod and passes default parameters. See more here>>>.

kubectl -n monitoring get cm prometheus-blackbox-prometheus-blackbox-exporter -o yaml
apiVersion: v1
data:
blackbox.yaml: |
modules:
http_2xx:
http:
follow_redirects: true
preferred_ip_protocol: ip4
valid_http_versions:
- HTTP/1.1
- HTTP/2.0
prober: http
timeout: 5s
Enter fullscreen mode Exit fullscreen mode

Actually, here we can see the modules, just one so far, which use the http prober to make HTTP requests to the targets, which still needs to be added.

Blackbox and ServiceMonitor

In order to add endpoints that we want to monitor, we can use ServiceMonitor, see config here>>>.

For some reason, this moment is not really described anywhere in the googled guides, although it is very useful and simple: we add a list of targets to the Blackbox config, and the Blackbox creates a ServiceMonitor for each of them, and Prometheus starts monitoring them.

Create a file blackbox-exporter-values.yaml with only one endpoint for now - just to check if it's working at all:

serviceMonitor:
  enabled: true
  defaults:
    labels:
      release: prometheus
  targets:
    - name: google.com
      url: [https://google.com](https://google.com)
Enter fullscreen mode Exit fullscreen mode

If not specified otherwise, Blackbox uses the default values ​​from the values.yaml of the chart, in this case, it will be the http_2xx module that executes GET request and checks the response code: if the 200 is received, then the check is passed, if another, then it's failed.

Update the Helm release with the new config:

$ helm -n monitoring upgrade — install prometheus-blackbox prometheus-community/prometheus-blackbox-exporter -f blackbox-exporter-values.yaml
Enter fullscreen mode Exit fullscreen mode

Check if the ServiceMonitor has been created:

kubectl -n monitoring get servicemonitor
NAME AGE
prometheus-blackbox-prometheus-blackbox-exporter-google.com 4m43s
Enter fullscreen mode Exit fullscreen mode

Check the Prometheus Targets:

For each Target that we specify in the Blackbox configuration, a separate scrape job is added in the Prometheus:

And check the Blackbox metrics:

The main metric that I personally use is the probe_success, which actually tells whether the check has been passed:

Here, in the target label, metricRelabelings sets a value from the name filed of the target from the Blackbox config, and the instance label has the URL.

Internal endpoints monitoring

Great — we went to Google, and it even works.

What about checking endpoints within a cluster?

Let’s take the example of nginx from the Kubernetes documentation, just will deploy its Pod and Service to our own namespace, not the default.

Create a namespace:

$ kubectl create ns test-ns
namespace/test-ns created
Enter fullscreen mode Exit fullscreen mode

Create a manifest with the Pod and Service, add your namespace:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  namespace: test-ns
  labels:
    app.kubernetes.io/name: proxy
spec:
  containers:
  - name: nginx
    image: nginx:stable
    ports:
      - containerPort: 80
        name: http-web-svc
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
  namespace: test-ns
spec:
  selector:
    app.kubernetes.io/name: proxy
  ports:
  - name: name-of-service-port
    protocol: TCP
    port: 80
    targetPort: http-web-svc
Enter fullscreen mode Exit fullscreen mode

Deploy it:

$ kubectl apply -f testpod-with-svc.yaml
pod/nginx created
service/nginx-service created
Enter fullscreen mode Exit fullscreen mode

Check the resources:

% kubectl -n test-ns get all
NAME READY STATUS RESTARTS AGE
pod/nginx 1/1 Running 0 23s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/nginx-service ClusterIP 10.106.58.247 <none> 80/TCP 23s
Enter fullscreen mode Exit fullscreen mode

Update the Blackbox config:

serviceMonitor:
  enabled: true
  defaults:
    labels:
      release: prometheus
  targets:
    - name: google.com
      url: [https://google.com](https://google.com)
    - name: nginx-test
      url: nginx-service.test-ns.svc.cluster.local:80
Enter fullscreen mode Exit fullscreen mode

Update the Helm release:

$ helm -n monitoring upgrade — install prometheus-blackbox prometheus-community/prometheus-blackbox-exporter -f blackbox-exporter-values.yaml
Enter fullscreen mode Exit fullscreen mode

Check ServiceMonitors again:

$ kubectl -n monitoring get servicemonitor
NAME AGE
prometheus-blackbox-prometheus-blackbox-exporter-google.com 12m
prometheus-blackbox-prometheus-blackbox-exporter-nginx-test 5s
Enter fullscreen mode Exit fullscreen mode

And in a minute we can check the probe_success:

In general, it is not necessary to specify the full URL in the form of nginx-service.test-ns.svc.cluster.local - it will be enough to set it like servicename.namespace, that is nginx-service.test-ns, but the full URL, in my opinion, looks more usable in labels and alerts.

Blackbox Exporter modules

Everything looks great until we poll a common HTTP endpoint that always returns a 200 code.

But how can we check for other HTTP codes?

Let’s create our own module using Blackbox probes:

config:
  modules:
    http_4xx:
      prober: http
      timeout: 5s
      http:
        method: GET
        valid_status_codes: [404, 405]
        valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
        follow_redirects: true
        preferred_ip_protocol: "ip4"
serviceMonitor:
  enabled: true
  defaults:
    labels:
      release: prometheus
  targets:
    - name: google.com
      url: [https://google.com](https://google.com)
    - name: nginx-test
      url: nginx-service.test-ns.svc.cluster.local:80
    - name: nginx-test-404
      url: nginx-service.test-ns.svc.cluster.local:80/404
      module: http_4xx
Enter fullscreen mode Exit fullscreen mode

Here in the modules we specify the name of the new module - http_4xx, which probe it should use - the http, and the parameters for this probe - what kind of request to use, and which response codes we consider correct.

Next, in the Targets for nginx-test-404, we explicitly specify the use of the module http_4xx.

Modules testing

Let’s see how we can check whether the module will work as we expect.

Everything is simple: run a test pod, and use the curl with the -I option to check the response of the endpoint.

For a TCP connection, you can use the telnet.

So, create a Pod with Ubuntu, and connect to it by running the bash:

$ kubectl -n monitoring run pod --rm -i --tty — image ubuntu --bash
Enter fullscreen mode Exit fullscreen mode

Install the curl and telnet:

root@pod:/# apt update && apt -y install curl telnet
Enter fullscreen mode Exit fullscreen mode

And check if the nginx-service.test-ns.svc.cluster.local:80/404 is working and which response code it will return:

root@pod:/# curl -I nginx-service.test-ns.svc.cluster.local:80/404
HTTP/1.1 404 Not Found
404 — as we expected.
Enter fullscreen mode Exit fullscreen mode

Update the Blackbox with a new configuration:

$ helm -n monitoring upgrade — install prometheus-blackbox prometheus-community/prometheus-blackbox-exporter -f blackbox-exporter-values.yaml
Enter fullscreen mode Exit fullscreen mode

Let’s check its ConfigMap  -  whether the module http_4xx that we specified in our config file has been added:

$ kubectl -n monitoring get cm prometheus-blackbox-prometheus-blackbox-exporter -o yaml
apiVersion: v1
data:
blackbox.yaml: |
modules:
http_2xx:
http:
follow_redirects: true
preferred_ip_protocol: ip4
valid_http_versions:
- HTTP/1.1
- HTTP/2.0
prober: http
timeout: 5s
http_4xx:
http:
follow_redirects: true
method: GET
preferred_ip_protocol: ip4
valid_http_versions:
- HTTP/1.1
- HTTP/2.0
valid_status_codes:
- 404
- 405
prober: http
timeout: 5s
Enter fullscreen mode Exit fullscreen mode

And check the result in the Prometheus:

probe_success{target="nginx-test-404"} == 1 - "It works!" (c)

TCP Connect and a database server monitoring

Another module that we use very often is the TCP, which simply tries to open a TCP connection to the specified URL and port. Suitable for checking databases and any other non-HTTP resources.

Let’s start a MySQL server:

$ helm repo add bitnami https://charts.bitnami.com/bitnami
helm install mysql bitnami/mysql
Enter fullscreen mode Exit fullscreen mode

Find its Service:

$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 20h
mysql ClusterIP 10.99.71.124 <none> 3306/TCP 40s
mysql-headless ClusterIP None <none> 3306/TCP 40s
Enter fullscreen mode Exit fullscreen mode

Update the Blackbox config:

config:
  modules:
    ...
    tcp_connect:
      prober: tcp
serviceMonitor:
  ...
  targets:
    ...
    - name: mysql
      url: mysql.default.svc.cluster.local:3306
      module: tcp_connect
Enter fullscreen mode Exit fullscreen mode

Deploy and check:

Prometheus alerting

There is nothing special to write about alerting — everything is standard like any other Prometheus alerts.

For example, we monitor Apache Druid Services with the following alert (screen from a Terraform configuration with some variables):

Just check that probe_success != 1.

Useful links

Originally published at RTFM: Linux, DevOps, and system administration.


Top comments (0)