gaurang101197

Posted on Aug 10 • Edited on Sep 24

Plotting Histogram Distribution Over Time in Grafana

#grafana #prometheus #observability #histogram

If you are looking for plotting histogram distribution over time as shown in above image then this blog is for you. This blog does not cover internals of histogram and Grafana.

Why Histogram Distribution Over Time

It helps to understand how distribution looks like over time.
It is very useful to find the time period when distribution skewed.
While histogram distribution summarize distribution and useful to check system performance at glance, distribution over time help to detect time period when performance degrades.

Pre-requisite

Internals of histogram: https://prometheus.io/docs/practices/histograms/
Better to have hands on experience on how Prometheus histogram works and prior experience with Grafana.

Use-case

Plot latency distribution over time of any operation, for e.g. API latency, db latency.

Setup

Measure latency metric using Prometheus Histogram.
Metric name is my_latency_metric.
Histogram buckets used are [0, 80, 160, 320, 640, 1280, 2560, 5120].

Step 1: Panel visualization

Select Heatmap in Panel section shown as below image.

Step 2: Query



round(sum by (le) (increase(my_latency_metric_bucket{label_name=~"label_value"}[$__interval])))

label_name=~"label_value" - [Optional] filters the metric data.
increase - Calculate the difference between two data points. We have used $__interval to make use of appropriate interval automatically calculated by Grafana.

Quote from prometheus documentation.

increase(v range-vector) calculates the increase in the time series in the range vector. Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for. The increase is extrapolated to cover the full time range as specified in the range vector selector, so that it is possible to get a non-integer result even if a counter increases only by integer increments.

increase acts on native histograms by calculating a new histogram where each component (sum and count of observations, buckets) is the increase between the respective component in the first and last native histogram in v.
sum by (le): Sums metric values by le (where le refers histogram bucket label name). Suppose you measure latencies of your API which is deployed on k8s with multiple pods and you have pod id as label name. In this case, each pod emits latency data and we want to get picture of overall deployment. So we need to aggregates data of all pods and sum by (le) perform this. It aggregates increase happens in each pod by le.
round: As you might know, increase can return non integer value and if we see non-integer number for counter then it looks bad. To avoid this, we use round function to convert all values to integer.

Step 3: Query Options

Select heatmap in Format and type {{le}} in Legend in query option as shown in below image.

Step 4: Panel Query Options

Select Min Interval as twice of Scrape Interval. In given example, I have used 1m. This handles variation in Scrape Interval If any.

Reference

https://grafana.com/blog/2020/06/23/how-to-visualize-prometheus-histograms-in-grafana/

DEV Community

Plotting Histogram Distribution Over Time in Grafana

Why Histogram Distribution Over Time

Pre-requisite

Use-case

Setup

Step 1: Panel visualization

Step 2: Query

Step 3: Query Options

Step 4: Panel Query Options

Reference

Top comments (0)

Read next

End-to-End Observability Project - Zero-to-Hero

Observability - 3(Prometheus Explanation)

Getting Started with Prometheus and Grafana in Java

AWS Lambda Log Aggregation Using CloudWatch Custom Log Group & Logs Insights!