The largest single source
of unreliability to a system is change
Now we take learning back to DevOps and understand why that friction occurred. New features, a progressive application that delivers new features competing with a stable reliable system that doesn't change. Dev and Ops.
For me, SLO's remove that personal aspect. Agreeing upfront the error budget, the level of reliability means both groups allow for each other. It is managing expectations.
And key to success is alignment of incentives between development and operations.
If a service is within SLO then you could...
One approach is only releasing features until the error budget is exhausted, then focusing development on reliability improvements until the budget is refilled.
No silver bullets...or are there?
Paint a real world scenario. You've spent all your error budget - can't incur any more outages or downtime but the product team really want to push out a new feature.
We have all been there and it's a reality of development. Hint - It's not personal, don't treat it as such!
But if this situation occurs teams can furnish stakeholders with a silver bullet token. A token that allows the bearer to propose ignoring the rules.
The tokens don't refresh and if the release is desired to still go ahead, the token bearer provides their token to the SRE's in order to enable the release.
Top comments (0)