DEV Community

Cover image for DevOps Monitoring Project
Pratik Nalawade
Pratik Nalawade

Posted on

DevOps Monitoring Project

Learning DevOps Monitoring with DevOps Shack: A Hands-On Journey


Image description

In the ever-evolving world of DevOps, monitoring is a critical aspect of maintaining the health and performance of your infrastructure. Recently, I embarked on a hands-on learning journey, following along with a YouTuber named DevOps Shack. Through their comprehensive tutorials, I implemented a full-fledged monitoring solution using Prometheus, Node Exporter, Alertmanager, and Blackbox Exporter. This blog post shares my experience and the key takeaways from the project.

Project Overview

The project is designed to provide an end-to-end monitoring solution for your infrastructure. By following the guidance of DevOps Shack, I was able to set up a robust system that not only monitors the health of virtual machines but also sends alerts for critical issues and probes service availability.

Tools and Technologies

  • Prometheus: Used for collecting and storing metrics.
  • Node Exporter: Used to expose hardware and OS metrics to Prometheus.
  • Alertmanager: Manages alerts generated by Prometheus.
  • Blackbox Exporter: Probes endpoints to check their availability.

Prerequisites

Before getting started, I ensured that:

  • Two virtual machines (VMs) were prepared.
  • wget and tar were installed on both VMs.
  • I had the necessary permissions to download, extract, and run the binaries.

Step-by-Step Setup

VM-1: Setting Up Node Exporter

1. Download Node Exporter:

wget https://github.com/prometheus/node_exporter/releases/download/v1.8.1/node_exporter-1.8.1.linux-amd64.tar.gz
Enter fullscreen mode Exit fullscreen mode

2. Extract Node Exporter:

tar xvfz node_exporter-1.8.1.linux-amd64.tar.gz
Enter fullscreen mode Exit fullscreen mode

3. Start Node Exporter:

cd node_exporter-1.8.1.linux-amd64
./node_exporter &
Enter fullscreen mode Exit fullscreen mode

VM-2: Setting Up Prometheus, Alertmanager, and Blackbox Exporter

Prometheus Setup

1. Download Prometheus:

wget https://github.com/prometheus/prometheus/releases/download/v2.52.0/prometheus-2.52.0.linux-amd64.tar.gz
Enter fullscreen mode Exit fullscreen mode

2. Extract Prometheus:

tar xvfz prometheus-2.52.0.linux-amd64.tar.gz
Enter fullscreen mode Exit fullscreen mode

3. Start Prometheus:

cd prometheus-2.52.0.linux-amd64
./prometheus --config.file=prometheus.yml &
Enter fullscreen mode Exit fullscreen mode

Alertmanager Setup

1. Download Alertmanager:

wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz
Enter fullscreen mode Exit fullscreen mode

2. Extract Alertmanager:

tar xvfz alertmanager-0.27.0.linux-amd64.tar.gz
Enter fullscreen mode Exit fullscreen mode

3. Start Alertmanager:

cd alertmanager-0.27.0.linux-amd64
./alertmanager --config.file=alertmanager.yml &
Enter fullscreen mode Exit fullscreen mode

Blackbox Exporter Setup

1. Download Blackbox Exporter:

wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.25.0/blackbox_exporter-0.25.0.linux-amd64.tar.gz
Enter fullscreen mode Exit fullscreen mode

2. Extract Blackbox Exporter:

tar xvfz blackbox_exporter-0.25.0.linux-amd64.tar.gz
Enter fullscreen mode Exit fullscreen mode

3. Start Blackbox Exporter:

cd blackbox_exporter-0.25.0.linux-amd64
./blackbox_exporter &
Enter fullscreen mode Exit fullscreen mode

Configuration Details

Image description

Prometheus Configuration (prometheus.yml)

  • Global Configuration:

    • Scrape interval: 15s
    • Evaluation interval: 15s
  • Scrape Configurations:

    • Prometheus itself:
    • Job name: prometheus
    • Target: localhost:9090
    • Node Exporter:
    • Job name: node_exporter
    • Target: 3.110.195.114:9100
    • Blackbox Exporter:
    • Job name: blackbox
    • Targets:
      • http://prometheus.io
      • https://prometheus.io
      • http://3.110.195.114:8080/

Alertmanager Configuration (alertmanager.yml)

  • Routing Configuration:

    • Group alerts by: alertname
    • Group wait: 30s
    • Group interval: 5m
    • Repeat interval: 1h
    • Default receiver: email-notifications
  • Receiver Configuration:

    • Receiver name: email-notifications
    • Email recipient: email@gmail.com
    • SMTP server: smtp.gmail.com:587
    • Auth username and password (to be configured).
  • Inhibition Rules:

    • Source match: severity: critical
    • Target match: severity: warning
    • Equal fields: alertname, dev, instance

Alert Rules Configuration (alert_rules.yml)

Some of the key alert rules I configured include:

  • InstanceDown: Alerts if an instance is down for more than 1 minute.
  • WebsiteDown: Alerts if a website probe fails.
  • HostOutOfMemory: Alerts if memory availability drops below 25%.
  • HostOutOfDiskSpace: Alerts if disk space is less than 50%.
  • HostHighCpuLoad: Alerts if CPU load exceeds 80%.
  • ServiceUnavailable: Alerts if a service is unavailable.
  • HighMemoryUsage: Alerts if memory usage exceeds 90%.
  • FileSystemFull: Alerts if file system free space drops below 10%.

Image description

Image description

Image description

Firewall and Security Settings

I had to configure the firewall to allow traffic on the necessary ports:

  • Prometheus: 9090
  • Alertmanager: 9093
  • Blackbox Exporter: 9115
  • Node Exporter: 9100

Key Features and Functionalities

Monitoring: Using Node Exporter and Prometheus, I was able to monitor crucial system metrics such as CPU usage, memory availability, and disk space. These metrics are scraped at regular intervals, providing real-time insights into the system's performance.

Alerting: With Alertmanager, I was able to set up notifications for critical events, ensuring that I was immediately informed of any issues, such as an instance going down or high CPU load. The flexibility of Alertmanager's configuration allowed me to tailor alerts to meet specific needs.

Probing: The Blackbox Exporter allowed me to monitor the availability and response times of various endpoints, including web services, ensuring that they remained accessible and responsive.

Challenges and Solutions

One of the challenges I faced during this project was configuring the firewall and ensuring proper communication between the services across the two VMs. By carefully adjusting firewall rules and thoroughly reviewing the configuration files, I was able to overcome these hurdles and achieve a seamless setup.

Future Enhancements

Moving forward, I plan to enhance this setup by integrating Grafana for more sophisticated visualization of the metrics collected by Prometheus. Additionally, I am considering automating the entire deployment process using Ansible, making it easier to replicate the setup across different environments.

Conclusion

Following along with DevOps Shack's tutorials provided me with a solid foundation in setting up a comprehensive monitoring and alerting system using open-source tools. This project, DevOps Shack, serves as a testament to the power of hands-on learning in mastering DevOps concepts. I encourage anyone interested in DevOps to explore these tools, experiment with configurations, and discover the power of effective monitoring in maintaining a healthy infrastructure.

You can explore more about the project on DevOps Shack’s YouTube channel (insert the actual link). If you have any questions or feedback, feel free to reach out!


This revised version emphasizes your learning experience and the value of following DevOps Shack's guidance. Feel free to personalize it further before publishing!

Top comments (0)