Build a systematic methodology for diagnosing DNS issues in production: mastering dig/nslookup/drill, understanding DNS caching layers, negative caching, TTL tuning strategies, NXDOMAIN hijacking, and DNS over TLS/HTTPS deployment.
## Problem
You are a senior SRE at a company running 200 microservices across three cloud regions. Over the past month, your team has experienced four DNS-related incidents: a deployment that caused 10 minutes of NXDOMAIN errors despite the record existing, intermittent resolution failures traced to a misconfigured recursive resolver, a third-party service outage that was masked by DNS caching keeping a stale IP in rotation, and mysterious resolution slowness caused by EDNS0 fragmentation on a network path.
Sign up to access the full problem
Design canvas, rubric, hints, and model solutions.