Skip to content
loading...

Have you ever heard a more beautiful phrase than this?

twitter logo github logo ・1 min read  

Yesterday a few of us at the office were noticing that we hadn't gotten an alert in the #monitoring channel in Slack for over a week. We get alerts every time error rate on dev.to pass a certain threshold.

@maestromac investigated. When I caught up with him a little while later about the issue, this was the prognosis:

Everything's fine with the monitoring. Turns out the site's just more stable.

Have you ever heard a more beautiful utterance?

We lowered the threshold a bit, and should expect an alert now and then at the new level.

Happy coding ❤️

I'm certain I've just jinxed it, so expect some significant downtime.

twitter logo DISCUSS (27)
markdown guide
 

I'm on a new project, and we don't have monitoring set up yet so I can live in blissful ignorance for now

 

I'm a freelance dev so I need to neither test nor monitor. How's that for blissful ignorance?! 😇😇

 

My team has been playing with the idea of “Monitoring Driven Development”. Create the failing alerts first, then get things deployed, now green. Guarantees we have good monitoring in place.

Next up: Before implementing a feature, put the instrumentation/metrics in place we need to determine if that feature is a success.

 

At a previous company, one day, I looked into the server room and noticed a lot of red lights flashing on disks. I ran to the admin and told him, but he shrugged and told me "just because the lights are flashing red, doesn't mean there's something wrong"

 

Maybe logging systems should be built to also include period "All's Well" alerts...

(a) That way, you always know the alert system is working,
(b) Who couldn't use more good news?

 
 

Haha, well, understand by "periodic," it's some quiet little message once a day in the log/channel, with no loud beeping every five seconds... ;)

Incidentally, an ironic twist on this is...when I got the email notification for your response, my email client couldn't load the YouTube video. So, I just saw "an error occurred".

My first thought was, "Aw, crap, I jinxed it!"

 
Everything's fine with the monitoring. Turns out the site's just more stable.

Amazing! Great job @dev.to team!

 

One of our customers often has a "high transaction" week (200%-300%), and they warn us about it before it starts. There have been various load issues in the past (not even during these high transaction weeks). A couple of weeks before I figured out an issue which could lead to erratic behavior and addressed it. Various monitors became quite stable. When the high transaction week started, our monitoring showed absolutely nothing of significance. System load, memory usage, etc. everything was still pretty much a flat line. The people on stand-by were worried something was broken and the transactions weren't going through. But nope, everything was working perfectly. This was quite a while ago. In the mean time average number of transactions per day have increase, and peak transactions have become higher. But none of this is really visible in our system monitoring.

 

Ahahh I pictured Mac going back to check knobs and levers and gauges with one of those yellow safety hats with the embedded torchlight

 

We should have these kinds of props handy now that I think about it

 
 

Performance is something I've learn to to keep an eye on it in the previous company I worked (newspaper) the high traffic keep me on edge always specially on big events.

I saw all the graphs and asked to the devOps team "it's ok don't worry, if something happened we'll let you know"

that phrase keep me ok but still at edge lol.

 

What do you use for monitoring, alarms and log gathering at Dev.to?

 

Whoa, that kind of "things are working well" makes me nervous.

Software engineering is never having to say you're done.

 

Stellar job, guys! Keep it up. Thank you for the hard work you put into this community. It means a lot to everyone here.

 
 

I use Sentry in one project and it's a good feeling to get the weekly reports after an update when the error rate has gone down 40% or so.

The error graphs approaching a flat-line more and more XD

 

Ben what's your take on Elixir? I see so many benefits. Have you ever considered using it for Dev.to?

 

I think it’s pretty sweet. Never seriously considered it for dev.to unless it just plugged right in nicely.

If we grow and find some time to be more exploratory (or have more dire scaling needs), it’ll definitely get some stronger consideration.

One pretty interesting thing for the future is Rust interop usehelix.com

 
 

We lowered the threshold a bit, and should expect an alert now and then at the new level.

Yes, it's a problem if you don't have any problem. :D

 

That's so cool to hear, nice job!

Just curious, how do you usually justify the current threshold and if it should be lowered or raised?

 

Maybe we could add a new observation here:
en.wikipedia.org/wiki/Fallacies_of...

  1. The error rate on dev.to never exceeds the minimum threshold.
 
Classic DEV Post from Feb 12

Five Myths of Salary Negotiation

In this video I debunk five negotiation myths that I've seen personally and from talking with candida...

Ben Halpern profile image
A Canadian software developer who thinks he’s funny. He/Him.