Systematically investigate a spike in 5xx errors across a microservice architecture, from initial alert to root cause identification, using structured debugging methodology.
## Problem
You are on-call and receive a PagerDuty alert: your team's API service is returning a 15% error rate (5xx responses), up from the normal baseline of 0.1%. Walk through your investigation process from the moment you receive the alert to identifying the root cause.
Sign up to access the full problem
Design canvas, rubric, hints, and model solutions.