Background
From time to time, my home is disconnected from the Internet, typically due to an outage of the Internet Service Provider. I do receive notification about my cameras going offline, but I have no idea how long it lasts; it can range from 10 minutes to 12 hours, and Comcast won’t tell you how long (you don’t even have access to the outage information unless you log in with the Comcast account). In addition, sometimes the Internet isn’t fully down, but the quality drops, so I’d like to set up something to monitor my Internet connection.
Journey of solution-hunting
The principle is simple: I set up something to ping (or cURL) https://google.com. Given the historical availability of that page, this probe is going to tell me whether I can reach Internet. Now the real question is, what tool can do the queries, save the data, and visualize the latency for me.
I did some research, and asked around. Most people don’t care; the ones who care typically uses SmokePing:
____ _ ____ _
/ ___| _ __ ___ ___ | | _____| _ \(_)_ __ __ _
\___ \| '_ ` _ \ / _ \| |/ / _ \ |_) | | '_ \ / _` |
___) | | | | | | (_) | < __/ __/| | | | | (_| |
|____/|_| |_| |_|\___/|_|\_\___|_| |_|_| |_|\__, |
|___/
Original Authors: Tobias Oetiker and Niko Tyni
SmokePing is a latency logging and graphing and alerting system. It consists of a daemon process which organizes the latency measurements and a CGI which presents the graphs.
SmokePing is ...
-
extensible through plug-in modules
-
easy to customize through a webtemplate and an extensive configuration file.
-
written in perl and should readily port to any unix system
-
an RRDtool frontend
-
able to deal with DYNAMIC IP addresses as used with Cable and ADSL internet.
cheers tobi
SmokePing has its own web UI to show the data. The UI looks good enough for most users, but I’m not an average user, but this is insufficient for an SRE-SWE. I’m more of a backend guy, so not very good at implementing the interactive features myself. Another option I found was Nagios:
Nagios Core, formerly known as Nagios, is a free and open-source computer-software application that monitors systems, networks and infrastructure. Nagios offers monitoring and alerting services for servers, switches, applications and services. It alerts users when things go wrong and alerts them a second time when the problem has been resolved.
I tried it briefly, before I had another idea: simply make a daemon (or a scheduled task) to post to some data-hosting service, such as Google Cloud Monitoring (also known as “StackDriver”. (Disclaimer: I’m a Googler.) I did some more research, and summarized my options in a GitHub issue:
When I was collecting libraries for this idea, I found Cloudprober.
Cloudprober
cloudprober / cloudprober
An active monitoring software to detect failures before your customers do.
NOTE: Cloudprober's active development has moved to
github.com/cloudprober/cloudprober
from github.com/google/cloudprober.
Cloudprober is a monitoring software that makes it super-easy to monitor availability and performance of various components of your system. Cloudprober employs the "active" monitoring model. It runs probes against (or on) your components to verify that they are working as expected. For example, it can run a probe to verify that your frontends can reach your backends. Similarly it can run a probe to verify that your in-Cloud VMs can actually reach your on-premise systems. This kind of monitoring makes it possible to monitor your systems' interfaces regardless of the implementation and helps you quickly pin down what's broken in your system.
Features
-
Out of the box, config based, integration with many popular monitoring systems:
-
Multiple options for checks:
Cloudprober was created by Googlers (probably as a side project), and it supports uploading data to Google Cloud Monitoring (exactly what I needed). The tool was designed for black-box service monitoring of owned services, but it can actually be pointed to any service. For example, I monitor the Google homepage with this configuration:
# proto-file: https://github.com/cloudprober/cloudprober/blob/master/config/proto/config.proto
# proto-message: cloudprober.ProberConfig
probe: {
name: "google_homepage"
type: HTTP
targets: {
host_names: "www.google.com"
}
interval: "30s"
timeout: "1s"
latency_distribution: {
exponential_buckets: {
scale_factor: 100
base: 1.1
num_buckets: 25
# last bucket starts at: 100 * 1.1^24 = 985 ms
}
}
latency_unit: "ms"
http_probe: {
protocol: HTTPS
}
}
surfacer: {
type: STACKDRIVER
stackdriver_surfacer: {
project: "franklinyu-home"
}
}
This creates nice charts on Google Cloud console like
Starting Cloudprober yourself
First, of course you need to download Cloudprober. You can follow the official guide to download the pre-built binary, or (if you are using Arch Linux) use my AUR package.
Then you need a Google Cloud project, and create a service account key. Follow the Google Cloud documentation to set the environment variable.
Now save the configuration file somewhere, for example in ~/Desktop/cloudprober.textproto
, and run Cloudprober like
cloudprober --config_file ~/Desktop/cloudprober.textproto
Wait for a while, and the metric should appear in Google Cloud Console as
custom/cloudprober/http/google_homepage/latency
And you can see the data in Metric Explorer.
Configuration
The configuration file is specified in the text format of Procotol Buffer, also known (in Google) as “Text-Proto”. Most parts are straight forward; the latency_distribution
stanza is explained in Google Cloud documentation. If we denote the scale-factor as “k” and base as “a”, then basically the buckets are the right-open intervals
except the first and last bucket (which has to cover the minimum and maximum). My strategy of choosing the base and the scale factor boils down to the “target interval”, which is the interval of latency that I care about. If we denote the number of buckets as “n”, then my strategy is that the entire “target interval” should be covered by buckets from second to the “second to last”. In other words, the “target interval” is supposed to be a subset of
Top comments (0)