In an SRE environment, availability is an important SLI (what is SLI -
read here).
One way to represent availability is in percentage uptime, condensed as n-nines. For example, 3 nines of availability means the service was up and running 99.9% of time.
A common mistake when defining availability SLOs (what's an SLO -
read here) is to validate a service being 'available' from an Ops or Infra perspective, rather than the customer perspective. If a customer is not able to perform a transaction because of a service failing, that service should be deemed as 'not available'. Basically, availability should be from a customer's standpoint, not an Ops engineer's perspective.
Here are some common figures thrown around as target SLOs and their translation in minutes/seconds.
availability |
annual downtime budget |
quarterly downtime budget |
monthly downtime budget |
week downtime budget |
One nine or 90% |
36.5 days |
9.1 days |
72 hours |
16.8 hours |
Two nines or 99% |
3.65 days |
21.9 hours |
7.2 hours |
1.68 hours |
- 3 nines is where we start getting serious -
|
Three nines or 99.9% |
8.76 hours |
2.2 hours |
43.8 minutes |
10.1 minutes |
Four nines or 99.99% |
52.56 minutes |
13.14 minutes |
4.32 minutes |
1.01 minutes |
Five nines or 99.999% |
5.26 minutes |
1.3 minutes |
25.9 seconds |
6.05 seconds |