Availability is the time a system remains operational to perform its required function in a specific period. It is a simple measure of the percentage of time that a system, service, or machine remains operational under normal conditions.
Availability is often quantified by uptime (or downtime) as a percentage of time the service is available. It is generally measured in the number of 9s.
|Availability %||Downtime (Year)||Downtime (Month)||Downtime (Week)|
|90% (one nine)||36.53 days||72 hours||16.8 hours|
|99% (two nines)||3.65 days||7.20 hours||1.68 hours|
|99.9% (three nines)||8.77 hours||43.8 minutes||10.1 minutes|
|99.99% (four nines)||52.6 minutes||4.32 minutes||1.01 minutes|
|99.999% (five nines)||5.25 minutes||25.9 seconds||6.05 seconds|
|99.9999% (six nines)||31.56 seconds||2.59 seconds||604.8 milliseconds|
|99.99999% (seven nines)||3.15 seconds||263 milliseconds||60.5 milliseconds|
|99.999999% (eight nines)||315.6 milliseconds||26.3 milliseconds||6 milliseconds|
|99.9999999% (nine nines)||31.6 milliseconds||2.6 milliseconds||0.6 milliseconds|
If a service consists of multiple components prone to failure, the service's overall availability depends on whether the components are in sequence or in parallel.
Overall availability decreases when two components are in sequence.
Bareach had 99.9% availability, their total availability in sequence would be 99.8%.
Overall availability increases when two components are in parallel.
Bareach had 99.9% availability, their total availability in parallel would be 99.9999%.
If a system is reliable, it is available. However, if it is available, it is not necessarily reliable. In other words, high reliability contributes to high availability, but it is possible to achieve high availability even with an unreliable system.
Both high availability and fault tolerance apply to methods for providing high uptime levels. However, they accomplish the objective differently.
A fault-tolerant system has no service interruption but a significantly higher cost, while a highly available system has minimal service interruption. Fault-tolerance requires full hardware redundancy as if the main system fails, with no loss in uptime, another system should take over.
This article is part of my open source System Design Course available on Github.
Learn how to design systems at scale and prepare for system design interviews
System Design Course
Hey, welcome to the course. I hope this course provides a great learning experience.
Table of contents
- N-tier architecture
- Message Brokers
- Message Queues
- Enterprise Service Bus (ESB)
- Monoliths and Microservices
- Event-Driven Architecture (EDA)
- Event Sourcing
- Command and Query Responsibility Segregation (CQRS)
- API Gateway
- REST, GraphQL, gRPC
- Long polling, WebSockets, Server-Sent Events (SSE)