All challenges
intermediateobservabilityreliability~12 min5 rounds

On-Call Is Drowning in Alerts. Silence Them?

A CPU alert pages constantly with no real impact. A teammate wants to silence it. Defend how you fix alerting properly.

the decision you defend

On-call keeps getting paged at 3am by an alert that fires whenever CPU goes over 80% for one minute, but nothing is actually wrong for users. A teammate says just silence that alert. What do you do, and why?

Sign in to startFree for everyone. Takes a few seconds.

the situation

Your on-call engineers are exhausted. The pager fires several times a night for an alert that triggers whenever CPU usage exceeds 80% for one minute. Every time, they check and find no user-facing impact.

context

The service handles the spikes fine; latency and error rates stay normal even when CPU is briefly high. The alert was added long ago and pages directly. A teammate, tired of the 3am pages, wants to silence the alert entirely.

How this challenge works

Take a position on the decision above and defend it. A senior-engineer AI will push back over up to 5 rounds. When you are done, you are scored against a verified rubric so you can see exactly what a complete answer covers - these are learning prompts, not gotchas.