DEV Community

loading...
Cover image for Have you ever heard a more beautiful phrase than this?

Have you ever heard a more beautiful phrase than this?

Ben Halpern
A Canadian software developer who thinks he’s funny. He/Him.
・1 min read

Yesterday a few of us at the office were noticing that we hadn't gotten an alert in the #monitoring channel in Slack for over a week. We get alerts every time error rate on dev.to pass a certain threshold.

@maestromac investigated. When I caught up with him a little while later about the issue, this was the prognosis:

Everything's fine with the monitoring. Turns out the site's just more stable.

Have you ever heard a more beautiful utterance?

We lowered the threshold a bit, and should expect an alert now and then at the new level.

Happy coding ❤️

I'm certain I've just jinxed it, so expect some significant downtime.

Discussion (27)

Collapse
chiefnoah profile image
Noah Pederson

I'm on a new project, and we don't have monitoring set up yet so I can live in blissful ignorance for now

Collapse
rubberduck profile image
Christopher McClellan

My team has been playing with the idea of “Monitoring Driven Development”. Create the failing alerts first, then get things deployed, now green. Guarantees we have good monitoring in place.

Next up: Before implementing a feature, put the instrumentation/metrics in place we need to determine if that feature is a success.

Collapse
patricktingen profile image
Patrick Tingen • Edited

At a previous company, one day, I looked into the server room and noticed a lot of red lights flashing on disks. I ran to the admin and told him, but he shrugged and told me "just because the lights are flashing red, doesn't mean there's something wrong"

Collapse
codemouse92 profile image
Jason C. McDonald

Maybe logging systems should be built to also include period "All's Well" alerts...

(a) That way, you always know the alert system is working,
(b) Who couldn't use more good news?

Collapse
ben profile image
Ben Halpern Author

Collapse
codemouse92 profile image
Jason C. McDonald • Edited

Haha, well, understand by "periodic," it's some quiet little message once a day in the log/channel, with no loud beeping every five seconds... ;)

Incidentally, an ironic twist on this is...when I got the email notification for your response, my email client couldn't load the YouTube video. So, I just saw "an error occurred".

My first thought was, "Aw, crap, I jinxed it!"

Collapse
david_j_eddy profile image
David J Eddy
Everything's fine with the monitoring. Turns out the site's just more stable.

Amazing! Great job @dev.to team!

Collapse
elmuerte profile image
Michiel Hendriks

One of our customers often has a "high transaction" week (200%-300%), and they warn us about it before it starts. There have been various load issues in the past (not even during these high transaction weeks). A couple of weeks before I figured out an issue which could lead to erratic behavior and addressed it. Various monitors became quite stable. When the high transaction week started, our monitoring showed absolutely nothing of significance. System load, memory usage, etc. everything was still pretty much a flat line. The people on stand-by were worried something was broken and the transactions weren't going through. But nope, everything was working perfectly. This was quite a while ago. In the mean time average number of transactions per day have increase, and peak transactions have become higher. But none of this is really visible in our system monitoring.

Collapse
ben profile image
Ben Halpern Author

Best of luck. I’m praying for you.

Collapse
rhymes profile image
rhymes

Ahahh I pictured Mac going back to check knobs and levers and gauges with one of those yellow safety hats with the embedded torchlight

Collapse
ben profile image
Ben Halpern Author

We should have these kinds of props handy now that I think about it

Collapse
striderhnd profile image
Erick Gonzales

Performance is something I've learn to to keep an eye on it in the previous company I worked (newspaper) the high traffic keep me on edge always specially on big events.

I saw all the graphs and asked to the devOps team "it's ok don't worry, if something happened we'll let you know"

that phrase keep me ok but still at edge lol.

Collapse
alex_barashkov profile image
Alex Barashkov

What do you use for monitoring, alarms and log gathering at Dev.to?

Collapse
awwsmm profile image
Andrew (he/him)

What do you mean by "error rate"?

Collapse
ben profile image
Ben Halpern Author

Percentage of web requests which fail.

Collapse
janux_de profile image
Jan Mewes

We lowered the threshold a bit, and should expect an alert now and then at the new level.

Yes, it's a problem if you don't have any problem. :D

Collapse
somedood profile image
Basti Ortiz (Some Dood)

Stellar job, guys! Keep it up. Thank you for the hard work you put into this community. It means a lot to everyone here.

Collapse
kayis profile image
K

I use Sentry in one project and it's a good feeling to get the weekly reports after an update when the error rate has gone down 40% or so.

The error graphs approaching a flat-line more and more XD

Collapse
antonrich profile image
Anton

Ben what's your take on Elixir? I see so many benefits. Have you ever considered using it for Dev.to?

Collapse
ben profile image
Ben Halpern Author

I think it’s pretty sweet. Never seriously considered it for dev.to unless it just plugged right in nicely.

If we grow and find some time to be more exploratory (or have more dire scaling needs), it’ll definitely get some stronger consideration.

One pretty interesting thing for the future is Rust interop usehelix.com

Collapse
deciduously profile image
Ben Lovy

Whoa, thanks. This is really cool.

Collapse
eljayadobe profile image
Eljay-Adobe

Whoa, that kind of "things are working well" makes me nervous.

Software engineering is never having to say you're done.

Collapse
theodesp profile image
Theofanis Despoudis

Maybe we could add a new observation here:
en.wikipedia.org/wiki/Fallacies_of...

  1. The error rate on dev.to never exceeds the minimum threshold.
Collapse
dalner21 profile image
Daniel Alner

That's so cool to hear, nice job!

Just curious, how do you usually justify the current threshold and if it should be lowered or raised?

Collapse
abhijitparida profile image
Abhijit Parida

Lmao of course you jinxed it!