DEV Community

Franklin Yu
Franklin Yu

Posted on

Monitor home Internet connectivity with Cloudprober

Background

From time to time, my home is disconnected from the Internet, typically due to an outage of the Internet Service Provider. I do receive notification about my cameras going offline, but I have no idea how long it lasts; it can range from 10 minutes to 12 hours, and Comcast won’t tell you how long (you don’t even have access to the outage information unless you log in with the Comcast account). In addition, sometimes the Internet isn’t fully down, but the quality drops, so I’d like to set up something to monitor my Internet connection.

Journey of solution-hunting

The principle is simple: I set up something to ping (or cURL) https://google.com. Given the historical availability of that page, this probe is going to tell me whether I can reach Internet. Now the real question is, what tool can do the queries, save the data, and visualize the latency for me.

I did some research, and asked around. Most people don’t care; the ones who care typically uses SmokePing:

GitHub logo oetiker / SmokePing

The Active Monitoring System

____                  _        ____  _             
/ ___| _ __ ___   ___ | | _____|  _ \(_)_ __   __ _ 
\___ \| '_ ` _ \ / _ \| |/ / _ \ |_) | | '_ \ / _` |
 ___) | | | | | | (_) |   <  __/  __/| | | | | (_| |
|____/|_| |_| |_|\___/|_|\_\___|_|   |_|_| |_|\__, |
                                              |___/ 

Original Authors: Tobias Oetiker and Niko Tyni

Build Test

SmokePing is a latency logging and graphing and alerting system. It consists of a daemon process which organizes the latency measurements and a CGI which presents the graphs.

SmokePing is ...

  • extensible through plug-in modules

  • easy to customize through a webtemplate and an extensive configuration file.

  • written in perl and should readily port to any unix system

  • an RRDtool frontend

  • able to deal with DYNAMIC IP addresses as used with Cable and ADSL internet.

cheers tobi




SmokePing has its own web UI to show the data. The UI looks good enough for most users, but I’m not an average user, but this is insufficient for an SRE-SWE. I’m more of a backend guy, so not very good at implementing the interactive features myself. Another option I found was Nagios:

Nagios Core, formerly known as Nagios, is a free and open-source computer-software application that monitors systems, networks and infrastructure. Nagios offers monitoring and alerting services for servers, switches, applications and services. It alerts users when things go wrong and alerts them a second time when the problem has been resolved.

I tried it briefly, before I had another idea: simply make a daemon (or a scheduled task) to post to some data-hosting service, such as Google Cloud Monitoring (also known as “StackDriver”. (Disclaimer: I’m a Googler.) I did some more research, and summarized my options in a GitHub issue:

When I was collecting libraries for this idea, I found Cloudprober.

Cloudprober

GitHub logo cloudprober / cloudprober

An active monitoring software to detect failures before your customers do.

Docker Pulls Go Build and Test Security Rating Maintainability Rating

NOTE: Cloudprober's active development has moved to github.com/cloudprober/cloudprober from github.com/google/cloudprober.

Cloudprober is a monitoring software that makes it super-easy to monitor availability and performance of various components of your system. Cloudprober employs the "active" monitoring model. It runs probes against (or on) your components to verify that they are working as expected. For example, it can run a probe to verify that your frontends can reach your backends. Similarly it can run a probe to verify that your in-Cloud VMs can actually reach your on-premise systems. This kind of monitoring makes it possible to monitor your systems' interfaces regardless of the implementation and helps you quickly pin down what's broken in your system.

Cloudprober Use Case

Features

Cloudprober was created by Googlers (probably as a side project), and it supports uploading data to Google Cloud Monitoring (exactly what I needed). The tool was designed for black-box service monitoring of owned services, but it can actually be pointed to any service. For example, I monitor the Google homepage with this configuration:

# proto-file: https://github.com/cloudprober/cloudprober/blob/master/config/proto/config.proto
# proto-message: cloudprober.ProberConfig

probe: {
  name: "google_homepage"
  type: HTTP
  targets: {
    host_names: "www.google.com"
  }
  interval: "30s"
  timeout: "1s"
  latency_distribution: {
    exponential_buckets: {
      scale_factor: 100
      base: 1.1
      num_buckets: 25
      # last bucket starts at: 100 * 1.1^24 = 985 ms
    }
  }
  latency_unit: "ms"
  http_probe: {
    protocol: HTTPS
  }
}

surfacer: {
  type: STACKDRIVER
  stackdriver_surfacer: {
    project: "franklinyu-home"
  }
}
Enter fullscreen mode Exit fullscreen mode

This creates nice charts on Google Cloud console like

heatmap of latency

Starting Cloudprober yourself

First, of course you need to download Cloudprober. You can follow the official guide to download the pre-built binary, or (if you are using Arch Linux) use my AUR package.

Then you need a Google Cloud project, and create a service account key. Follow the Google Cloud documentation to set the environment variable.

Now save the configuration file somewhere, for example in ~/Desktop/cloudprober.textproto, and run Cloudprober like

cloudprober --config_file ~/Desktop/cloudprober.textproto
Enter fullscreen mode Exit fullscreen mode

Wait for a while, and the metric should appear in Google Cloud Console as

custom/cloudprober/http/google_homepage/latency
Enter fullscreen mode Exit fullscreen mode

And you can see the data in Metric Explorer.

Configuration

The configuration file is specified in the text format of Procotol Buffer, also known (in Google) as “Text-Proto”. Most parts are straight forward; the latency_distribution stanza is explained in Google Cloud documentation. If we denote the scale-factor as “k” and base as “a”, then basically the buckets are the right-open intervals

[kai,kai+1) [k a^i, k a^{i+1})

except the first and last bucket (which has to cover the minimum and maximum). My strategy of choosing the base and the scale factor boils down to the “target interval”, which is the interval of latency that I care about. If we denote the number of buckets as “n”, then my strategy is that the entire “target interval” should be covered by buckets from second to the “second to last”. In other words, the “target interval” is supposed to be a subset of

[k,kan2) [k, k a^{n-2})

Top comments (0)