Design a chaos engineering platform that orchestrates 1,000+ simultaneous fault injections across a distributed fleet with sub-100ms injection latency and sub-1s kill switch response.
## Problem
Design a chaos engineering platform that allows engineering teams to define, schedule, and execute controlled fault injection experiments against production infrastructure. The system must support multiple fault types (network latency/partition, CPU/memory/disk pressure, process termination), enforce blast radius limits, automatically verify steady-state health during experiments, and provide a global kill switch that terminates all active faults within one second.
Sign up to access the full problem
Design canvas, rubric, hints, and model solutions.