Design a fleet-wide kernel live patching system that applies security fixes to 10,000 servers without reboots, achieving <4 hour patch SLA with zero downtime.
## Problem
Your organization runs 10,000 production Linux servers across three regions. A critical kernel CVE has been disclosed, and your security team requires patching within 4 hours. However, your services have strict availability SLAs that prohibit reboots during business hours, and a rolling restart strategy would take 12+ hours. Design a kernel live patching system that can apply security fixes to the entire fleet without reboots, with strong safety guarantees and the ability to rollback quickly if a patch causes problems.
Sign up to access the full problem
Design canvas, rubric, hints, and model solutions.
Explain Linux Process Scheduling
Mid-Senior · Conceptual