In an SRE environment, availability is an important SLI (what is SLI - read here
One way to represent availability is in percentage uptime, condensed as n-nines. For example, 3 nines of availability means the service was up and running 99.9% of time.
A common mistake when defining availability SLOs (what's an SLO - read here
) is to validate a service being 'available' from an Ops or Infra perspective, rather than the customer perspective. If a customer is not able to perform a transaction because of a service failing, that service should be deemed as 'not available'. Basically, availability should be from a customer's standpoint, not an Ops engineer's perspective.
Here are some common figures thrown around as target SLOs and their translation in minutes/seconds.
||annual downtime budget
||quarterly downtime budget
||monthly downtime budget
||week downtime budget
|One nine or 90%
|Two nines or 99%
- 3 nines is where we start getting serious -
|Three nines or 99.9%
|Four nines or 99.99%
|Five nines or 99.999%