Histogram of request time in Grafana with Telegraf

#telegraf #grafana #histogram #heatmap

This is a writing about a cool tool useful for analyzing backend call time. Code that does backend calls and monitoring setup described in previous post.

Grafana panel can not only plot line graphs, but also:

show last reading of metric
show table of metric values
show bar plots
show heatmaps (histogram over time)

Heatmap is helpful for quickly getting understanding what is distribution of backend response time: it can be the case that most requests complete in under 50 msec, but some requests are slow and complete in >500 msec. Average request time doesn't show this information. In previous examples, we're plotting just the average.

We can easily add a heatmat for request execution time:

Need to add new panel, pick measurement details, and select "Heatmap" in "Visualization" collapsible in the right column.
Every 10 seconds, a new set of bricks appears on the panel. Brick color represents how much measurements fall into that bucket (e.g. 5 fall in the 10 msec - 20 msec range, hence that brick is pink). Set a fixed bucket size or fix the number of buckets, or let default values do their magic.

In case Telegraf sends all metrics data to InfluxDB, that's a real heatmap. Telegraf is often configured to send only aggregated values to database (min, avg, max) calculated over short period of time (10sec) in order to reduce metrics reporting traffic. Heatmap based on such aggregated value is not a real heatmap.

It is possible to configure histogram aggregate in Telegraf config (full Telegraf config with histogram aggregator):



[[aggregators.histogram]]
  period = "30s"
  drop_original = false
  reset = true
  cumulative = false

  [[aggregators.histogram.config]]
    buckets = [1.0, 10.0, 12.0, 14.0, 16.0, 18.0, 20.0, 30.0, 40.0]
    measurement_name = "aiohttp-request-exec-time"
    fields = ["value"]

I set reset=true and cumulative=false which will cause buckets values to be calculated anew for each 30 second period. Need to set value ranges (buckets) manually, as well as specify correct measurement_name. If fields is not specified, histogram buckets are computed for all fields of measurement. Here's how bucket values appear in InfluxDB:

The amount of request execution times that falls in a bucket is saved under "value_bucket" field name, "gt" ("greater than") and "le" ("less than or equals to") are bucket edge values that appear as tags.

Let's plot these values using "Bar gauge" panel visualization type: