Design a multi-burn-rate alerting system for 200+ services that detects SLO violations early using fast-burn and slow-burn windows while minimizing false positives and alert fatigue.
## Problem
Your organization has adopted SLOs for 200+ services but alerting is still based on static thresholds (e.g., "error rate > 1%"). This causes alert fatigue — too many false positives during traffic spikes, and missed slow burns that silently consume error budget. Design an SLO-based alerting system that uses multi-burn-rate detection to catch real SLO violations early while keeping the false positive rate below 5%.
Sign up to access the full problem
Design canvas, rubric, hints, and model solutions.
Explain SLO Trade-offs at Staff Level
Staff · Conceptual