James Heggs

Posted on Sep 29, 2020 • Edited on May 9, 2021

GCP DevOps Certification - Pomodoro Four

#googlecloud #certification #sre #devops

Service Level Objectives (SLOs)

Is it a competing balance?

Pressure of new features whilst maintaining a level of reliability for a system that is in constant flux.

Knowing just how reliable a system should be seems to be an interesting question. This takes me back to working on a system that was part of the GDS scheme. We built a system that had a pretty high level of reliability only to find out that it would be turned off during holiday periods - not for technical reasons but for people reasons. There were simply no support staff should any customers have a support query!

We certainly didn't ask "how reliable does the system need to be?"

NFR's (Non Functional Requirements)

Reliability (or many other ity's for that matter) can be classed as Non-Functional Requirements...but if that is the case how do we prioritise? It always comes down to prioritisation.

Also does classing reliability as a NFR do it a disservice?

SLO's is the practice of setting a reliability target and communicating that target wide enough for people to be aware. Having a side effect of your organisation being able to determine independently whether or not the service is reliable enough.

So SLO's could help development teams answer, when making software, just how fast is too fast?

Estimating Reliability Risks

We can estimate risks to reliability (from the roll-out of new features) in terms of:

time to detection
time to resolution
impact percentage

But the focus irrespective of terminology goes back to those DevOps cultural, human aspects.

Shared ownership of where developers feel they have a shared responsibility to make the service reliable and the operations team feel that they have a responsibility to help new features reach users as quickly as possible is crucial.

Both enabling each other and agreeing on the service level objective.

Three Principles for Setting SLO's

The above covers a bit of the "why" and human aspects so the next stage is a bit of the "what" of SLO's.

What to promise and to whom
What metrics to measure
How much reliability is good enough

SLA's vs SLO's

I like to think of SLA's (Service Level Agreements) as the last line of defence. Essentially the promises you make to customers around the service you will deliver. It's the service they are paying you for and their minimum expectations for that service.

If you break the SLA then the customer can quite rightly expect some form of reimbursement. You broke their agreement.

So clearly it's better to find out BEFORE you break an SLA.

Thats where our fancy SLO's come in. They should be stronger than your SLA's such that they get spotted before the customers impact is felt through breaking an SLA.

In turn that means you might have a chance to fix the issue before you break your customer agreement.

DEV Community

GCP DevOps Certification - Pomodoro Four

Service Level Objectives (SLOs)

NFR's (Non Functional Requirements)

Estimating Reliability Risks

Three Principles for Setting SLO's

SLA's vs SLO's

Top comments (0)

Read next

How to Install k3s with High Availability (HA)

The Only Shortcut to Becoming a Cloud/DevOps Engineer in 2025

How to Install Docker on Windows, macOS, and Linux: A Step-by-Step Guide

Simplify Environment Variable Management with GitHub Environments