DEV Community

Hamdi KHELIL
Hamdi KHELIL

Posted on

🛡️ Effective Vulnerability Monitoring in Kubernetes

Introduction

Hey there, Kubernetes explorer! 🌟 As your Kubernetes environment grows, keeping it secure becomes more challenging, especially when dealing with multiple clusters. Imagine managing several clusters (spokes) and needing a single source of truth for all your security metrics—sounds like a big task, right? 🤔 That's where Trivy, Trivy Operator, OpenTelemetry, Prometheus, and Grafana come to the rescue.

In this guide, I’ll show you how to set up Trivy and Trivy Operator in a federated Kubernetes environment, collect vulnerability data using OpenTelemetry, and centralize it using either an in-cluster Prometheus setup or managed services like Azure Monitor (for Prometheus) and Grafana Cloud or Azure Managed Grafana. By the end of this, you’ll have a system that monitors vulnerabilities across all your clusters from one place. Let’s dive in! 🏊‍♂️

Prerequisites 🛠️

Before we get started, make sure you have:

  • Multiple Kubernetes clusters (managed via Kubernetes Federation or another method), including a central (hub) cluster and multiple spoke clusters.
  • Admin access to each cluster.
  • OpenTelemetry Collector set up in each spoke cluster.
  • Basic knowledge of Kubernetes, OpenTelemetry, Prometheus, and Grafana.

Step 1: Deploy Trivy Operator in Each Spoke Kubernetes Cluster 🚢

The first step is to deploy Trivy Operator in each of your spoke clusters. The Trivy Operator integrates directly with Kubernetes, automatically scanning workloads for vulnerabilities and exposes these results as Prometheus-compatible metrics.

1. Install Trivy Operator in Spoke Clusters

Let’s start by installing Trivy Operator in each of your spoke clusters using Helm. Open up your terminal and run:

helm repo add aqua https://aquasecurity.github.io/helm-charts/
helm repo update
helm install trivy-operator aqua/trivy-operator --namespace trivy-system --create-namespace
Enter fullscreen mode Exit fullscreen mode

This installs the Trivy Operator in the trivy-system namespace. Now, each of your spoke clusters is ready to start scanning for vulnerabilities!

For more details, check out the Trivy Operator documentation and Helm chart repository.

For additional configuration options, refer to the Trivy Operator Metrics documentation.

2. Customize Scanning

You can configure Trivy Operator to scan based on your specific needs. For instance, you might want to scan for only high and critical vulnerabilities or ignore certain files:

trivy:
  severity: HIGH,CRITICAL
  ignoreUnfixed: true
  skipFiles: ["/usr/local/bin"]
Enter fullscreen mode Exit fullscreen mode

To apply these settings, update the values.yaml file before installation or use the --set flag in Helm:

helm upgrade trivy-operator aqua/trivy-operator --namespace trivy-system --set trivy.severity=HIGH,CRITICAL --set trivy.ignoreUnfixed=true
Enter fullscreen mode Exit fullscreen mode

3. Verify the Setup

After deployment, verify that the Trivy Operator is up and running:

kubectl get pods -n trivy-system
Enter fullscreen mode Exit fullscreen mode

You should see the Trivy Operator pods running. To check if it’s scanning correctly, look at the generated VulnerabilityReport resources:

kubectl get vulnerabilityreports -A
Enter fullscreen mode Exit fullscreen mode

This command will show you the vulnerability reports across all namespaces, confirming that Trivy Operator is working as expected.

5. Verify Trivy Operator Metrics

Ensure that the Trivy Operator is exposing the metrics correctly. You can check this by port-forwarding the Trivy Operator service and curling the metrics endpoint:

kubectl port-forward service/trivy-operator -n trivy-system 5000:80
Enter fullscreen mode Exit fullscreen mode

This should return a list of Prometheus-compatible metrics related to Trivy vulnerabilities, confirming that Trivy Operator metrics are functioning properly.

Step 2: Deploy OpenTelemetry Collectors in Each Spoke Cluster 🌐

With Trivy Operator running in each spoke cluster, the next step is to deploy OpenTelemetry Collectors to gather and forward the metrics to your central Prometheus instance or Azure Monitor for Prometheus.

1. Install OpenTelemetry Collector in Each Spoke Cluster

Deploy the OpenTelemetry Collector in each spoke cluster. The collector will scrape metrics from the Trivy Operator and send them to your centralized Prometheus solution, whether that's a self-hosted instance, Azure Monitor, or another service.

You can deploy the OpenTelemetry Collector using Helm or by applying Kubernetes manifests. Here’s an example using Helm:

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update
helm install otel-collector open-telemetry/opentelemetry-collector --namespace monitoring --create-namespace
Enter fullscreen mode Exit fullscreen mode

For more detailed configuration options, refer to the OpenTelemetry Collector Helm chart documentation.

2. Configure OpenTelemetry Collector

You’ll need to configure the OpenTelemetry Collector to scrape metrics from the Trivy Operator and forward them to your central Prometheus instance. Depending on your setup, you might be using:

  • Self-Hosted Prometheus (External Access): Prometheus hosted in your central hub cluster.
  • Azure Monitor for Prometheus: Managed Prometheus service offered by Azure.

Example Configuration for Self-Hosted Prometheus:

Here’s an example configuration for the OpenTelemetry Collector if you’re using a self-hosted Prometheus instance:

receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'trivy-operator'
          scrape_interval: 15s
          static_configs:
            - targets: ['trivy-operator.trivy-system.svc.cluster.local:9100']

exporters:
  prometheusremotewrite:
    endpoint: "http://<EXTERNAL_PROMETHEUS_HUB_URL>:9090/api/v1/write"

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      exporters: [prometheusremotewrite]
Enter fullscreen mode Exit fullscreen mode

Example Configuration for Azure Monitor for Prometheus:

If you’re using Azure Monitor for Prometheus, your configuration would look like this:

receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'trivy-operator'
          scrape_interval: 15s
          static_configs:
            - targets: ['trivy-operator.trivy-system.svc.cluster.local:9100']

exporters:
  prometheusremotewrite:
    endpoint: "https://<YOUR_AZURE_PROMETHEUS_URL>/api/v1/write"
    headers:
      Authorization: "Bearer <YOUR_AZURE_API_TOKEN>"

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      exporters: [prometheusremotewrite]
Enter fullscreen mode Exit fullscreen mode

Replace <EXTERNAL_PROMETHEUS_HUB_URL> or <YOUR_AZURE_PROMETHEUS_URL> and <YOUR_AZURE_API_TOKEN> with your actual Prometheus endpoint and API token. This configuration tells the OpenTelemetry Collector to scrape metrics from the Trivy Operator and then send them to the specified Prometheus instance.

3. Apply the Configuration

Once you’ve configured the OpenTelemetry Collector, apply the configuration to your cluster:

kubectl apply -f otel-collector-config.yaml -n monitoring
Enter fullscreen mode Exit fullscreen mode

Ensure that the OpenTelemetry Collector is running correctly and scraping metrics by checking the logs:

kubectl logs -l app=otel-collector -n monitoring
Enter fullscreen mode Exit fullscreen mode

Step 3: Set Up the Centralized Prometheus and Grafana 📊

Now that you’ve set up OpenTelemetry Collectors in each spoke cluster, the next step is to configure your centralized monitoring solution. You have two primary options: using a self-hosted Prometheus and Grafana setup or leveraging managed services like Azure Monitor for Prometheus and Grafana Cloud or Azure Managed Grafana.

Option 1: Self-Hosted Prometheus and Grafana

1. Set Up Prometheus in the Hub Cluster

If you prefer to manage Prometheus yourself, deploy it in the hub cluster with external access:

helm install prometheus-hub prometheus-community/prometheus --namespace monitoring --create-namespace
Enter fullscreen mode Exit fullscreen mode

Expose Prometheus externally using an Ingress, LoadBalancer, or NodePort service, depending on your environment. For more information on setting up Prometheus with external access, refer to the Prometheus documentation.

2. Set Up Grafana in the Hub Cluster

Deploy Grafana in your hub cluster:

helm install grafana

 grafana/grafana --namespace monitoring --create-namespace
Enter fullscreen mode Exit fullscreen mode

Access Grafana by forwarding the port or exposing the service:

kubectl port-forward svc/grafana 3000:80 -n monitoring
Enter fullscreen mode Exit fullscreen mode

Add your self-hosted Prometheus as a data source in Grafana, using the external URL of Prometheus (e.g., http://<EXTERNAL_PROMETHEUS_HUB_URL>:9090).

Option 2: Azure Monitor for Prometheus and Managed Grafana

If you prefer a managed solution, Azure offers managed services for both Prometheus and Grafana, which can simplify operations and scalability.

1. Set Up Azure Monitor for Prometheus

To set up Azure Monitor for Prometheus, follow the instructions in the Azure documentation. You’ll configure OpenTelemetry Collectors in your spoke clusters to send metrics to Azure Monitor using the prometheusremotewrite exporter.

2. Set Up Grafana Cloud or Azure Managed Grafana

Azure also offers a managed Grafana service that integrates seamlessly with Azure Monitor for Prometheus. You can set up Azure Managed Grafana by following the official documentation.

After setting up, add Azure Monitor as a data source in Grafana, and start creating your dashboards using the metrics collected from your clusters.

3. Create Dashboards in Grafana

Regardless of whether you’re using a self-hosted or managed solution, you’ll want to visualize the data in Grafana. Here are a few panel ideas:

  • Vulnerabilities by Severity: Show the count of vulnerabilities categorized by severity (e.g., Critical, High, Medium).

PromQL Query:

  count(trivy_image_vulnerabilities{severity="CRITICAL"})
Enter fullscreen mode Exit fullscreen mode
  • Vulnerabilities Over Time: Display how the number of vulnerabilities changes over time.

PromQL Query:

  count_over_time(trivy_image_vulnerabilities[1h])
Enter fullscreen mode Exit fullscreen mode
  • Vulnerabilities by Cluster: Compare the vulnerabilities across different spoke clusters.

PromQL Query:

  count(trivy_image_vulnerabilities{severity="HIGH"} by (cluster))
Enter fullscreen mode Exit fullscreen mode
  • Top Vulnerable Images: List the most vulnerable container images across all clusters.

PromQL Query:

  topk(5, sum(trivy_image_vulnerabilities) by (image))
Enter fullscreen mode Exit fullscreen mode

4. Build a Unified Dashboard

Organize these panels into a comprehensive dashboard:

  • Top Row: Summary stats (e.g., total vulnerabilities, total critical).
  • Middle Row: Time series charts showing vulnerabilities over time.
  • Bottom Row: Breakdown by cluster and by container image.

Wrapping Up 🎁

And there you have it! You’ve successfully set up a federated vulnerability scanning solution across multiple Kubernetes clusters using Trivy, Trivy Operator, OpenTelemetry, and either a self-hosted or managed Prometheus and Grafana setup. By leveraging OpenTelemetry and centralized monitoring solutions like Azure Monitor for Prometheus and Managed Grafana, you can monitor your security posture from a single location, simplifying operations and ensuring scalability. 🚀

This setup not only makes vulnerability management easier but also ensures that no matter how many clusters you manage, you always have a clear view of their security status.

Happy scanning, and stay secure! 🔐

For further reading and deeper understanding, you might want to check out the official documentation for each of the tools:

Top comments (0)