How to Alert
26.01.2023 08:31
- Define SLI’s and SLO’s
- We have to define what is successful operation of the service. Burger World example:
- Burger World has 3 main objectives:
- Serve customer orders at drive through in 3min or less.
- Serve customer orders at walk up counter in 5min or less.
- Less than 1% of orders result in corrective action.
- First assume that all other operations are working as designed, we only monitor and alert on deviations from the primary objectives.
- Monitor for the success or failure to meet those definitions
- Alert on deviation from agreed upon SLO’s
- Identify root cause of deviations
- Write KB’s for how to address root cause of deviations
- Create new alerts for root cause
- Repeat.