DEV Community

Cover image for Have you ever heard a more beautiful phrase than this?

Have you ever heard a more beautiful phrase than this?

Ben Halpern on December 07, 2018

Yesterday a few of us at the office were noticing that we hadn't gotten an alert in the #monitoring channel in Slack for over a week. We get alerts...
Collapse
 
patricktingen profile image
Patrick Tingen • Edited

At a previous company, one day, I looked into the server room and noticed a lot of red lights flashing on disks. I ran to the admin and told him, but he shrugged and told me "just because the lights are flashing red, doesn't mean there's something wrong"

Collapse
 
codemouse92 profile image
Jason C. McDonald

Maybe logging systems should be built to also include period "All's Well" alerts...

(a) That way, you always know the alert system is working,
(b) Who couldn't use more good news?

Collapse
 
ben profile image
Ben Halpern

Collapse
 
codemouse92 profile image
Jason C. McDonald • Edited

Haha, well, understand by "periodic," it's some quiet little message once a day in the log/channel, with no loud beeping every five seconds... ;)

Incidentally, an ironic twist on this is...when I got the email notification for your response, my email client couldn't load the YouTube video. So, I just saw "an error occurred".

My first thought was, "Aw, crap, I jinxed it!"

Collapse
 
rubberduck profile image
Christopher McClellan

My team has been playing with the idea of “Monitoring Driven Development”. Create the failing alerts first, then get things deployed, now green. Guarantees we have good monitoring in place.

Next up: Before implementing a feature, put the instrumentation/metrics in place we need to determine if that feature is a success.

Collapse
 
david_j_eddy profile image
David J Eddy
Everything's fine with the monitoring. Turns out the site's just more stable.

Amazing! Great job @dev.to team!

Collapse
 
rhymes profile image
rhymes

Ahahh I pictured Mac going back to check knobs and levers and gauges with one of those yellow safety hats with the embedded torchlight

Collapse
 
ben profile image
Ben Halpern

We should have these kinds of props handy now that I think about it

Collapse
 
elmuerte profile image
Michiel Hendriks

One of our customers often has a "high transaction" week (200%-300%), and they warn us about it before it starts. There have been various load issues in the past (not even during these high transaction weeks). A couple of weeks before I figured out an issue which could lead to erratic behavior and addressed it. Various monitors became quite stable. When the high transaction week started, our monitoring showed absolutely nothing of significance. System load, memory usage, etc. everything was still pretty much a flat line. The people on stand-by were worried something was broken and the transactions weren't going through. But nope, everything was working perfectly. This was quite a while ago. In the mean time average number of transactions per day have increase, and peak transactions have become higher. But none of this is really visible in our system monitoring.

Collapse
 
ben profile image
Ben Halpern

Best of luck. I’m praying for you.

Collapse
 
alex_barashkov profile image
Alex Barashkov

What do you use for monitoring, alarms and log gathering at Dev.to?

Collapse
 
striderhnd profile image
Erick Gonzales

Performance is something I've learn to to keep an eye on it in the previous company I worked (newspaper) the high traffic keep me on edge always specially on big events.

I saw all the graphs and asked to the devOps team "it's ok don't worry, if something happened we'll let you know"

that phrase keep me ok but still at edge lol.

Collapse
 
awwsmm profile image
Andrew (he/him)

What do you mean by "error rate"?

Collapse
 
ben profile image
Ben Halpern

Percentage of web requests which fail.

Collapse
 
janux_de profile image
Jan Mewes

We lowered the threshold a bit, and should expect an alert now and then at the new level.

Yes, it's a problem if you don't have any problem. :D

Collapse
 
somedood profile image
Basti Ortiz

Stellar job, guys! Keep it up. Thank you for the hard work you put into this community. It means a lot to everyone here.

Collapse
 
kayis profile image
K

I use Sentry in one project and it's a good feeling to get the weekly reports after an update when the error rate has gone down 40% or so.

The error graphs approaching a flat-line more and more XD

Collapse
 
antonrich profile image
Anton

Ben what's your take on Elixir? I see so many benefits. Have you ever considered using it for Dev.to?

Collapse
 
ben profile image
Ben Halpern

I think it’s pretty sweet. Never seriously considered it for dev.to unless it just plugged right in nicely.

If we grow and find some time to be more exploratory (or have more dire scaling needs), it’ll definitely get some stronger consideration.

One pretty interesting thing for the future is Rust interop usehelix.com

Collapse
 
deciduously profile image
Ben Lovy

Whoa, thanks. This is really cool.

Collapse
 
eljayadobe profile image
Eljay-Adobe

Whoa, that kind of "things are working well" makes me nervous.

Software engineering is never having to say you're done.

Collapse
 
theodesp profile image
Theofanis Despoudis

Maybe we could add a new observation here:
en.wikipedia.org/wiki/Fallacies_of...

  1. The error rate on dev.to never exceeds the minimum threshold.
Collapse
 
dalner21 profile image
Daniel Alner

That's so cool to hear, nice job!

Just curious, how do you usually justify the current threshold and if it should be lowered or raised?