Service Level Objectives (SLOs)
Is it a competing balance?
Pressure of new features whilst maintaining a level of reliability for a system that is in constant flux.
Knowing just how reliable a system should be seems to be an interesting question. This takes me back to working on a system that was part of the GDS scheme. We built a system that had a pretty high level of reliability only to find out that it would be turned off during holiday periods - not for technical reasons but for people reasons. There were simply no support staff should any customers have a support query!
We certainly didn't ask "how reliable does the system need to be?"
NFR's (Non Functional Requirements)
Reliability (or many other ity's for that matter) can be classed as Non-Functional Requirements...but if that is the case how do we prioritise? It always comes down to prioritisation.
Also does classing reliability as a NFR do it a disservice?
SLO's is the practice of setting a reliability target and communicating that target wide enough for people to be aware. Having a side effect of your organisation being able to determine independently whether or not the service is reliable enough.
So SLO's could help development teams answer, when making software, just how fast is too fast?
Estimating Reliability Risks
We can estimate risks to reliability (from the roll-out of new features) in terms of:
- time to detection
- time to resolution
- impact percentage
But the focus irrespective of terminology goes back to those DevOps cultural, human aspects.
Shared ownership of where developers feel they have a shared responsibility to make the service reliable and the operations team feel that they have a responsibility to help new features reach users as quickly as possible is crucial.
Both enabling each other and agreeing on the service level objective.
Three Principles for Setting SLO's
The above covers a bit of the "why" and human aspects so the next stage is a bit of the "what" of SLO's.
- What to promise and to whom
- What metrics to measure
- How much reliability is good enough
SLA's vs SLO's
I like to think of SLA's (Service Level Agreements) as the last line of defence. Essentially the promises you make to customers around the service you will deliver. It's the service they are paying you for and their minimum expectations for that service.
If you break the SLA then the customer can quite rightly expect some form of reimbursement. You broke their agreement.
So clearly it's better to find out BEFORE you break an SLA.
Thats where our fancy SLO's come in. They should be stronger than your SLA's such that they get spotted before the customers impact is felt through breaking an SLA.
In turn that means you might have a chance to fix the issue before you break your customer agreement.
Top comments (0)