DEV Community

WAKAYAMA shirou
WAKAYAMA shirou

Posted on • Edited on

Back up prometheus records to s3 via kinesis firehose

Because prometheus has not for long term storage, it will be erased at appropriate intervals. For long term data store, it would be preserved as being influxdb. However, AWS S3 is more easy place to manage.

AWS Kinesis firehose has came to the Tokyo region in July 2017. By using kinesis firehose, we can automatically save records in S3.

So I implemented a remote write adapter integration of prometheus which can sends records to Kinesis.

https://github.com/shirou/prometheus_remote_kinesis

This is just for evaluation and not deployed to production, there may be some problems.

prometheus_remote_kinesis

How to use

It is necessary to build with go, but I omit it. It is easy to use multi stage build with Docker.

  $ prometheus_remote_kinesis -stream-name prometheus-backup
Enter fullscreen mode Exit fullscreen mode
-s tream-name
kinesis stream name (required)
-l isten-addr
listen address. If not specified `:9501`.

Of course, you should set AWS credentials.

I also put it in the docker hub.

https://hub.docker.com/r/shirou/prometheus_remote_kinesis/

It should start with such feeling.

  docker run - d -name remote_kinesis \
    --restart=always \
    -p 9501:9501 \
    -e STREAM_NAME=backup-prometheus-log \
    shirou/prometheus_remote_kinesis
Enter fullscreen mode Exit fullscreen mode

Settings on the prometheus side

Set the remote write setting of prometheus.yml as follows. It is important to add a - before the url to make it a sequence.

  remote_write:
   - url: http://localhost:9501/receive
Enter fullscreen mode Exit fullscreen mode

The settings of kinesis and kinesis forehose are omitted.

The setting is over with the above. As time goes on the logs are generated more and more in s3.

JSON format

The data sent to kinesis was made into JSON format like this.

  {
   "name" : "scrape_duration_seconds" ,
   "time" : 1513264725773 ,
   "value" : 0.004345524 ,
   "labels" : {
     "__name__" : "scrape_duration_seconds" ,
     "instance" : "localhost: 9090" ,
     "job" : "prometheus" ,
     "monitor" : "monitor"
   }
 }
Enter fullscreen mode Exit fullscreen mode

In Timeseries of prometheus, multiple samples can be stored for one records. However, since I do not want to increase the hierarchy much, I flatten it so that I create one record for each sample. As it was impossible to truly label, it is on the map. In addition, it assumes use from Athena or S3 SELECT, and it makes it with new line (JSON-LD). I tried to send it to kinesis with gzip compressed, but I removed currently because my t2.small uses a CPU too much. In addition, it sends it by PutRecords by every 500 records. As it will be buffered, it may be lost if remoe_kinesis die. Graceful shutdown is implemented, though.

Since the write request from prometheus comes with snappy compression + protobuf, it may be the fastest way to transfer to kinesis with its byte sequence as it is, but this will be difficult to handle it later .

Summary

I created remote storage integration to save to s3 via AWS Kinesis for long term log preset of prometheus.

I have not deployed to production environment, so there may be problems. Another problem is reading from S3. But it is just a plain JSON format, I think it is easy to convert if necessary. Although it is possible to read directory from promehteus with remote read storage integration, but it probably not good performance.

Oh, AlpacaJapan is recruiting highly acclaimed people who will do this around. Please contact Twitter @r_rudi

Top comments (0)