1. Project Overview
This project involved setting up a monitoring and logging system using Prometheus and Grafana to monitor the performance and health of cloud infrastructure and applications. The system allows for real-time data collection, visualization, and alerting, enabling proactive management of resources.
2. Environment Setup
-
Server Provisioning:
- Cloud Provider: AWS EC2 instance
-
Instance Type:
t3.medium
- Operating System: Ubuntu 22.04
- Network Configuration: The instance was associated with a public subnet and had Security Groups configured to allow HTTP (port 80), HTTPS (port 443), and Grafana (port 3000).
3. Prometheus Installation and Configuration
-
Installation:
- The latest version of Prometheus was downloaded and installed:
wget https://github.com/prometheus/prometheus/releases/download/v2.47.0/prometheus-2.47.0.linux-amd64.tar.gz tar -xvzf prometheus-2.47.0.linux-amd64.tar.gz cd prometheus-2.47.0.linux-amd64
-
Running Prometheus:
- Prometheus was started using the following command:
./prometheus --config.file=prometheus.yml
-
Configuration:
- The
prometheus.yml
file was configured to scrape metrics from the local machine:
scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090']
- The
4. Grafana Installation and Configuration
-
Installation:
- Grafana was downloaded and installed independently of Prometheus:
wget https://dl.grafana.com/oss/release/grafana_10.0.0_amd64.deb sudo dpkg -i grafana_10.0.0_amd64.deb
-
Starting Grafana:
- The Grafana service was started and enabled to run on boot:
sudo systemctl start grafana-server sudo systemctl enable grafana-server
-
Accessing Grafana:
- Grafana was accessed via a web browser at
http://your-server-ip:3000
. - The default login credentials (
admin
/admin
) were used for the first login, and the password was changed upon initial access.
- Grafana was accessed via a web browser at
-
Adding Prometheus as a Data Source:
- Prometheus was added as a data source within Grafana:
- Navigate to Configuration > Data Sources.
- Select Prometheus and set the URL to
http://localhost:9090
. - Save and test the connection to ensure it was successful.
5. Dashboard Creation
-
Creating a New Dashboard:
- A new dashboard was created in Grafana to visualize metrics:
- Go to Dashboards > New Dashboard.
- Add a panel, selecting a metric from Prometheus (e.g., CPU usage).
- Customize the visualization and save the dashboard.
-
Dashboard Example:
- The dashboard was designed to display key metrics like CPU usage, memory usage, disk I/O, and network traffic, allowing for a comprehensive view of the system’s performance.
6. Alerting Setup in Prometheus
-
Configuration:
- Alerting rules were added to the
prometheus.yml
configuration:
alerting: alertmanagers: - static_configs: - targets: ['localhost:9093'] rule_files: - "alert.rules.yml"
- An example alert rule was created in the
alert.rules.yml
file:
groups: - name: example rules: - alert: HighCPUUsage expr: process_cpu_seconds_total > 0.85 for: 5m labels: severity: critical annotations: summary: "High CPU usage detected"
- Alerting rules were added to the
-
Verification:
- Alerts were verified by stressing the system and checking that the alert was triggered and displayed in the Prometheus web UI.
7. Security and Maintenance
-
Security Measures:
- Basic security was implemented by setting up authentication for Grafana.
- The firewall was configured to restrict access to the monitoring services.
- HTTPS was considered for secure access, although it was not implemented in this basic setup.
-
Maintenance:
- Regular maintenance tasks were established, including:
- Updating Prometheus and Grafana as new versions are released.
- Backing up Grafana dashboards and Prometheus configuration files.
- Reviewing and updating alert configurations based on the evolving infrastructure needs.
8. Testing and Outcome
-
System Testing:
- The monitoring and alerting setup was tested by inducing load on the server and verifying that metrics were collected and visualized correctly in Grafana.
- Alerts were triggered as expected when predefined conditions were met.
-
Final Outcome:
- The project was successfully completed, with a fully operational monitoring and logging system in place. The system provides real-time insights into the health and performance of the cloud infrastructure, enabling proactive management and rapid issue resolution.
9. Conclusion
This project provided hands-on experience in setting up and configuring a monitoring and logging system using Prometheus and Grafana. The skills gained are crucial for maintaining high availability, performance, and security in modern cloud environments. The setup is scalable and can be adapted to more complex infrastructures as needed.
Appendix
-
Useful Commands:
- Restart Prometheus:
sudo systemctl restart prometheus
- Restart Grafana:
sudo systemctl restart grafana-server
- View Prometheus logs:
sudo journalctl -u prometheus.service -f
- View Grafana logs:
sudo journalctl -u grafana-server -f
- Restart Prometheus:
-
Resources:
This documentation should serve as a comprehensive guide for anyone looking to replicate or expand upon this project.
Top comments (0)