Toggle and I were thinking about monitoring this week. Monitoring isn't the same as observability, but they're both important. Monitoring is having something that watches your current state for changes. We often mix it up with alerting, which takes results from monitoring and yells about them. (Toggle says they never yell, because sound doesn't travel in a vacuum.)
We've been spending time thinking about monitoring because there are some awesome things that you can do with monitoring+feature flags. For example, imagine if you've got monitoring on an inbound data pipe that is writing to your database. Suddenly, there is an unexplained 100x spike in traffic. If it's not something you expected, odds are good that it's bad traffic - bots, or zombies, or DDS, or something similar. You probably don't want this all writing to your database, because it's garbage. Monitoring notices this problem, and alerting hollers that you should do something about it. To reduce how long it takes to react, you can also have the alert trigger a feature flag to shunt the traffic from your main database to a holder somewhere, until you can figure out if it's valid or not. It's going to be a lot easier to clean things up if you don't have to do it in your production database, and because the feature flag is automated into the monitoring and alerting, you don't even have to count on human reaction times!
When you think about it, a lot of monitoring is about safety and detecting variations from standard before they become A Problem. That's important in spaceships and production environments alike.
Happy Wednesday! Today’s #ToggleTalk is on Monitoring.— Heidi, The Sticker Thoughtleader (@wiredferret ) May 20, 2020
🚨 Have a story about when alerts went terribly wrong? (or silent?)
️👾 What’s an important thing about monitoring that most people miss?
Tag answers with #ToggleTalk
I was hoping for more disaster stories, but I'm a devops nerd, and that's how we bond. Toggle is a nicer person than I am, so they liked the answers we got about doing better at monitoring.
The clearest theme was that monitoring is necessary, but not sufficient. User experience is the thing that most people brought up as the place where monitoring can fail or not represent reality.
#toggletalk It doesn’t matter if your system is working properly for almost everyone. Users care about how it works //for them//.— Not Fake Adam Kalsey (@akalsey) May 20, 2020
You don’t care about FedEx’s impressive logistics system when your package is late or lost. That’s a 100% failure rate as far as you can tell.
Our coworker Dawn chimed in with some ideas about how to make monitoring inclusive of user experiences:
Two things I've seen people miss:— Dawn Parzych (@dparzych ) May 20, 2020
-Monitoring third-party components and services. Your components may be fine but an issue with a 3rd party can impact the user experience.
-Monitoring the bigger picture. Include a combination of synthetic and real user monitoring. #ToggleTalk
Like every other metric, what we measure is what drives our behavior. Choose what you monitor carefully, because your organization will bend to make those metrics as good as possible. Re-evaluate what you're monitoring regularly to make sure that your metrics reflect the product experience.
If you didn't get a chance to contribute to this week's ToggleTalk, feel free to comment here! Keep an eye out for next week!
Posted on by: