loading...
Cover image for Monitoring external services with Prometheus and Grafana
Amplifr.com

Monitoring external services with Prometheus and Grafana

dsalahutdinov profile image Salahutdinov Dmitry ・5 min read

Preface

Time ago, web-applications depended only on self-written code or libraries. It was easy to diagnose any problem, plainly investigating how an application work.

Today, modern web-applications tend to have many external dependencies, such as microservices or external SaaS. It makes your system depend not only on the internal factors but on the reliability of external services as well. When something fails, it's essential to localize a problem and find out which side caused it.

That is why monitoring of external services call is crucial.

I work for Amplifr, the social media automation platform. Our Ruby core application depends on many external APIs: Facebook, Instagram, LinkedIn, Pinterest, Bitly, Stripe, Amplitude, Intercom, and many others.
We also are integrated with many self-hosted micro-services, such as imgproxy for resizing images, browser screenshot generator, social services bots, and other internal domain-specific services.

When an application has so many integrations, it's handy to have a quick traffic details overview:

  • how many requests application does
  • how much time consumes an average request
  • how many of the attempts fail

That was my second point to set monitoring up and write my own solution.

Metrics

As most APIs work over HTTP, we will monitor the HTTP requests.

Meet my gem yabeda-http_requests. It is the new one of the Yabeda monitoring solution by Evil Martians. If you are not familiar with it, here is the great article to start.

In a nutshell, Yabeda - is the open-source Ruby framework for collection metrics of Ruby processes (Rack/Rails, Sidekiq,p Puma) and exporting them into Prometheus (or any other adapter).

yabeda-http_requiests wraps all the external HTTP requests with the help of sniffer gem. Sniffer patches all popular HTTP libraries and logs performing requests. By using middleware, yabeda-http_requests gem calculates and stores metrics. Then metrics could be exported with yabeda-prometheus gem.

To get started with it you need:

  • to add the gem into your dependencies (Gemfile)
  • set up metrics exporter for Prometheus
  • tune up Prometheus to scrape the metrics endpoint
  • import predefined dashboard into your Grafana config

You could find all the details yabeda-http_requests at the README.md.

Let's have a look at providing metrics & charts.

What to monitor?

The first release of yabeda-http_requests includes some basics:

RPM

The basic metric is RPM (request per minute). It shows how many HTTP request application does.

Formally, yabeda-http_requests collect a http_request_count as counter. It is separated by quest host, port and method:

# HELP http_response_total A counter of the total number
# of external HTTP responses.
http_response_total{host="twitter.com",port="443",method="GET",status="200"} 131.0
http_response_total{host="dev.to",port="443",method="GET",status="200"} 131.0

Than, using rate function:

sum by (host) (rate(http_response_total[1m]) )

renders the RPM chart:

Request per minute (by host)

It is easy to regroup the same chart by status:

Request duration chart

❗ "RPM by status" chart uses another metric, which is called http_response_total which logs the HTTP responses.

Request duration

When you want to investigate which one of the dependent components causes a slow-down, there is a histogram duration data and the corresponding chart. It shows how many times consumes the slowest request:

Alt Text

Example Application

This section is for practical approach lovers. Here is the example of web-application. It shows how all things work together: web-application, metrics collector, exporter, and Prometheus & Grafana.

The example has a docker-compose config for automatic setup. Just run the following:

$ git clone https://github.com/yabeda-rb/yabeda-http_requests
$ cd yabeda-http_requests/example
$ docker-compose up

This command runs the rack-application, Sidekiq, Prometheus, and Grafana.
Rack application adds random web scrapping jobs to the sidekiq. Navigate to http://localhost:9292 to enqueue some jobs.

"Sidekiq instance" runs background scrapping tasks, collects metrics, and exposes an endpoint with data for the Prometheus.
Here is the basic sidekiq configuration, showing how the sidekiq process run its web-server:

# sidekiq config
Sidekiq.configure_server do |config|
  config.redis = { url: ENV.fetch('REDIS_URL') }

  # configures all the Yabeda metrics
  ::Yabeda.configure!

  # expose 9394 port with /metrics path for prometheus
  ::Yabeda::Prometheus::Exporter.start_metrics_server!
end

Navigate to http://localhost:9394/metrics to see metrics endpoint, and you will see the metrics values formatted as text:

# TYPE http_request_total counter
# HELP http_request_total A counter of the total number of external HTTP requests.
http_request_total{host="twitter.com",port="443",method="GET"} 16922.0
http_request_total{host="dev.to",port="443",method="GET"} 17083.0
http_request_total{host="amplifr.com",port="443",method="GET"} 451.0
http_request_total{host="www.sitepoint.com",port="443",method="GET"} 428.0
# TYPE http_response_total counter
# HELP http_response_total A counter of the total number of external HTTP responses.
http_response_total{host="twitter.com",port="443",method="GET",status="200"} 16910.0
http_response_total{host="dev.to",port="443",method="GET",status="200"} 17080.0
http_response_total{host="amplifr.com",port="443",method="GET",status="301"} 451.0
http_response_total{host="www.sitepoint.com",port="443",method="GET",status="200"} 428.0
http_response_total{host="twitter.com",port="443",method="GET",status="503"} 1.0
# TYPE http_response_duration_milliseconds histogram
# HELP http_response_duration_milliseconds A histogram of the response                                                duration (milliseconds).
http_response_duration_milliseconds_bucket{host="twitter.com",port="443",method="GET",status="200",le="0.5"} 0.0
http_response_duration_milliseconds_bucket{host="twitter.com",port="443",method="GET",status="200",le="1"} 0.0
http_response_duration_milliseconds_bucket{host="twitter.com",port="443",method="GET",status="200",le="2.5"} 0.0
http_response_duration_milliseconds_bucket{host="twitter.com",port="443",method="GET",status="200",le="5"} 0.0
http_response_duration_milliseconds_bucket{host="twitter.com",port="443",method="GET",status="200",le="10"} 0.0
http_response_duration_milliseconds_bucket{host="twitter.com",port="443",method="GET",status="200",le="25"} 0.0
...

Prometheus scrapes http://sidekiq:9394/metrics periodically and stores the metrics into its database.

Here is the overview diagram of how it works together:
HTTP requests monitoring example diagram

Grafana visualizes the data. Navigate to http://localhost:3000/d/OGd-oEXWz/yabeda-external-http-requests. Use admin/foobar to login and see the template dashboard:

Yabeda External HTTP requests Grafana dashboard

❗You could import prepared Grafana dashboard here.

Conclusion

The monitoring of external HTTP calls is essential. It is handy to an overview of your outgoing application traffic. It also helps to diagnose slow-down issues of dependent services.

Yabeda framework and the yabeda-http_requests let to quickly set up metrics and visualize them using a predefined Grafana dashboard template.

If you have any other ideas, please do not hesitate to contribute: by writing an Issue, sending PR, or leaving the comment.

Thank you for reading.

Discussion

pic
Editor guide