loading...

What tools do you use for monitoring?

ben profile image Ben Halpern ・1 min read

Whether big or small programs, or whatever your definition of "monitoring" is, I'm curious about what tools folks are using and what they're liking.

Discussion

pic
Editor guide
 

I have used the following:

  • CloudWatch Logs/Metrics/Alerts: it's ok but dashboards are not super pretty, querying logs is not super easy and not fast.
  • ELK + elastalert: pretty cool dashboards, easy to query data. Cons: easy to overload you ElasticSearch instance with data, type conflicts, logstash problems
  • CloudWatch Logs + Lambda + ElasticSearch + Kibana: you don't have logstash but the rest remains
  • New Relic, quite neat, powerful query language, nice dashboards, lots of plugins. Cons: probably expensive
 

I'm using AWS Cloudwatch it's been a good enough solution but we want to take it to the next level by using something like ELK. We also want to connect the AWS alerts to a Slack channel (at the moment we use email).

 

Server: NewRelic
Logging: Timber
Marketing/Usage: Segment (With GA and a bunch of other tools attached)
App bug reporting: Sentry
Backend bug reporting: Appsignal

Honestly, we don't monitor or log 10% of what we should. Just haven't had the time yet to implement better monitoring. #StartupLife

 

Quick update to this. We're using ELK as well as Timber for our backend logging now.

 

any specific reason why timber was not sufficient anymore?

We wanted a self-hosted logging solution that allowed us to quickly generate dashboards from our logs.

Thanks for the reply1 So it was about not being locked-in to a vendor and/or to be more flexible?

Yep, pretty much. That and we wanted to own the data entirely without it going to a third party.

totally understandable! thanks :)

 

I'll just focus this response on client-side and Node error handling. For errors we don't handle, we've been relying on Sentry. We've been pretty happy with it. I've also heard good things about Track JS.

When I was still doing .NET, we relied on the Enterprise Library Application Blocks for logging and exception handling.

 

Riemann (riemann.io/) is pretty exciting. It's awesome because the config is written in Clojure. It's terrible because the config is written in Clojure.

It's very fun to use, and it supports unit testing your config which is very nice. It's not something I'm using right now as I prefer SaaS to running my own services, if at all possible, but I have used it successfully in the past couple of years.

 

We are using the following tools:

  • cloudwatch for the aws setup
  • ELKstack in combination with logstash for log aggregation of all tools
  • CheckMK (mathias-kettner.com/check_mk.html) to monitor more deeper details for every host + we started integrating app specific checks as well

NewRelic is nice to have but extremely expensive when you are running > 60 servers in production, using it for only a few hosts does not help you in my eyes.

 

In my company and my personal projects, I use:

  • Sentry: online version or selfhosted, for error handling. It does a great job capturing exceptions, showing all the exception trace and not duplicating it if it happens more than one time.
  • Jenkins: monitoring tests and staging server deployment output.
  • Zabbix: monitoring almost all other things: servers availability via ping, server processes uptime and performance with templates, backups output, web sites response code and time, periodic background processes being executed, etc.
 

We're using ELK with variety of Filebeats connected. It's pretty convenient, cause you can easily plug in and out new elements. And dockerized version is an opportunity to upgrade versions without excessive overhead

 

I use cloudwatch too integrated with slack for alarms

 

At my company, we use New Relic which monitors almost everything. And if you want something specific to Rails applications then even skylight.io is a good option!

 

Icinga2 for service monitoring with email and sms alert (for important services).

InfluxDB/Grafana for performance monitoring, data comes mostly from icinga but some from legacy things from collectd.

 

NewRelic (for Rails based API and for React Frontend) + logz.io as hosted ELK. Both send alerts to our Rocket.Chat (and to our Wallboard)
NewRelic is very powerful, the Frontend Part (Browser) is by far not so powerfull as the NPM section but okay.

 

Legacy system (COBOL) for financial transactions and each tester is responsible for certain transactions (fund withdrawal, fund transfer, payment, cash surrender, death claim, lapse, monthiversary, loan, change of beneficiary, etc) Each tester has a number of test cases with expected results. When the programmers code and the base is released into the test environment, the OLD RELIABLE TESTS are run through the new code. The testers check for differences. For example, last time the test case produced these results, now it produces these results. Tester reviews documentation and new project specifications written by spec writers and programmers for 1) unexpected changes and also 2) to begin verifying that expected changes are happening. So our monitoring system in acceptance test is regression test outputs reviewed by testers. Every time many program changes / fixes are made to the code, a regression test is run and testers are given the output to review. They verify changes and report bugs.

 

A colleague of mine built a tool dubbed "Mnemosyne" (as part of his master thesis), to monitor and profile requests across our microservice architecture.

Server / GUI: github.com/jgraichen/mnemosyne-server
Client library (Ruby): github.com/jgraichen/mnemosyne-ruby

 

PM2/Keymetrics has been working reasonably well for me.

 

On my personal projects, I use the check it once a day manual way. 😅 (I should set up a real monitoring tool one day haha)

 

Yes, you REALLY should 😂

 

My company used to use Zabbix earlier. However, we switched to New Relic now.

 

in terms of availability and uptime monitoring, I'd recommend websitepulse.com
Very high level of customization of both cost and target set up, and 24/7 live customer support. Give them a try!

 

Also, for website uptime monitoring I use updown.io/ which I like for its pricing model. Highly recommended!

 

Perl and/or Python for SNMP and ping to monitor and report on connectivity and bandwidth consumption.

 

For web applications, take a look into uptimerobot.com/. Its free and you can use it for check both production and qa enviroments.

 

I have been trying out instrumentalapp.com/ for a month now. Liking it so far.

Was using server-monitor.pingdom.com/ before.

 

We have been eating our own dog food with AWS CloudWatch and alarms on email.

 

Years ago using Ganglia to monitor Hadoop cluster, now prefer to use Prometheus.

 

I like to use, Sentry and App Center from Microsoft, i'm focus on mobile and front-end development with React