On-Call Is Drowning in Alerts. Silence Them?
A CPU alert pages constantly with no real impact. A teammate wants to silence it. Defend how you fix alerting properly.
the decision you defend
On-call keeps getting paged at 3am by an alert that fires whenever CPU goes over 80% for one minute, but nothing is actually wrong for users. A teammate says just silence that alert. What do you do, and why?
the situation
Your on-call engineers are exhausted. The pager fires several times a night for an alert that triggers whenever CPU usage exceeds 80% for one minute. Every time, they check and find no user-facing impact.
context
The service handles the spikes fine; latency and error rates stay normal even when CPU is briefly high. The alert was added long ago and pages directly. A teammate, tired of the 3am pages, wants to silence the alert entirely.
How this challenge works
Take a position on the decision above and defend it. A senior-engineer AI will push back over up to 5 rounds. When you are done, you are scored against a verified rubric so you can see exactly what a complete answer covers - these are learning prompts, not gotchas.