Kenta Takeuchi

Posted on Sep 10, 2022 • Originally published at bmf-tech.com

SLIs・SLOs・SLAs

#sre #sli #slo #sla

This article is a translation of SLI・SLO・SLAについて.

About SLIs, SLOs, and SLAs

I will summarize what I have investigated about SLI, SLO, and SLA.

What are SLOs, SLIs, and SLAs?

SLO, SLI, and SLA are indicators, targets, and agreements related to service levels.
A service level is a specific measure of service provided over a period of time.

SLI (Service Level Indicator)
- Service level indicators
Indicators, metrics to measure service levels
ex. availability, latency, error rate, throughput
SLO (Serivce Level Objective)
- service level targets
- Target quantitative or qualitative value of service level
- Consider external dependencies
  - Communication with external services, externally linked parts such as SLO of managed services, etc.
SLAs (Service Level Agreements)
- Service level agreement
- Service level agreements and guarantees between service providers and users
- It is better to set the target value looser than SLO

How to set SLI/SLO

I think it's good that the best practices advocated by NewRelic are easy to work with.

newrelic.com - Best practices for setting SLI/SLO in modern systems

It introduces how to formulate SLI/SLO by defining system boundaries, defining functions for each boundary, defining availability for each function, and defining SLI for availability measurement.

When starting the operation of SLI/SLO, it is recommended to start operation with loose values as simple as possible.

cf. sre.google - Chapter 4 - Service Level Objectives

When I actually formulated SLI/SLO for my business, I followed this NewRelic practice, but I adjusted the functional units so that they were not too detailed.

If the unit of function is made finer from the beginning, the operation will become difficult, so I think it is better to adjust the granularity as necessary during operation.

Tips

Tips for keywords related to SLI/SLO.

The difference between reliability and availability

reliability
- A characteristic of a system that is the degree of tolerance to failure
Availability
- Degree to which the system can continue to operate

List of uptime and downtime, availability calculation

Availability	Annual Downtime	Monthly Downtime
99.0%	87.6 hours	7.6 hours
99.5%	43.8 hours	3.65 hours
99.9%	8.76 hours	43.8 minutes
99.95%	4.38 hours	21.9 minutes
99.99%	52.56 seconds	4.38 minutes
99.999%	5.256 seconds	26.28 seconds
99.9999%	31.536 seconds	2.628 seconds

What is an error budget?

A budget for error, a measure of acceptable reliability calculated relative to an SLO.
ex. SLO 99.99% → error budget 0.01% or less

Impression

By making the service level measurable, it becomes possible to observe whether service users (users or systems) are able to provide services satisfactorily, and for service providers, it becomes an indicator of whether improvement of the service level is necessary. I thought I'd get

DEV Community

SLIs・SLOs・SLAs

About SLIs, SLOs, and SLAs

What are SLOs, SLIs, and SLAs?

How to set SLI/SLO

Tips

The difference between reliability and availability

List of uptime and downtime, availability calculation

What is an error budget?

Impression

Reference

Top comments (0)

Read next

Why Global Standards of API Design Save Your Team Time

LeetCode Challenge: 12. Integer to Roman - JavaScript Solution 🚀

12 New JavaScript Features Every Developer Should Know

Terraform stories.