JuniorSystems Design30m·Datadog, Grafana, Google, Amazon, Netflix, Pagerduty

Design an Observability Strategy

Design an observability strategy covering the three pillars (metrics, logs, traces), choosing instrumentation points, alert hygiene practices, and dashboard design for a team operating 15 microservices.

observabilitymonitoringloggingtracingalerting

## Problem

Your team operates 15 microservices across Go, Python, and Node.js. Observability is ad hoc — some services have Prometheus metrics, some have structured logs, and distributed tracing is not implemented. When an incident occurs, the on-call engineer spends the first 15 minutes just figuring out which service is involved. Design an observability strategy that gives the team clear visibility into system behavior.

Sign up to access the full problem

Design canvas, rubric, hints, and model solutions.

Get Started Free Sign In

Constraints

Service CountSign up to view

Signal CoverageSign up to view

Alert QualitySign up to view

Dashboard UsabilitySign up to view

Design an Observability Strategy

Constraints

Related Problems