A Traffic Spike Caused an Outage. Autoscaling Did Not Save You.
A spike took the service down even though autoscaling was on. A teammate wants to crank the max replicas. Defend the real fix.
the decision you defend
A traffic spike took your service down even though horizontal autoscaling was enabled - it scaled too slowly to keep up. A teammate says just raise the max replicas way up. What is the real fix?
the situation
A marketing push sent a sudden burst of traffic at your service. Latency spiked and requests started failing. Horizontal autoscaling was enabled, but by the time extra pods came up the spike had already caused an outage.
context
The service scales on CPU at 70 percent. Pods take close to a minute to pull their image and warm up before serving, and the cluster was running near capacity. A teammate's only suggestion is to raise the maximum replica count significantly.
How this challenge works
Take a position on the decision above and defend it. A senior-engineer AI will push back over up to 5 rounds. When you are done, you are scored against a verified rubric so you can see exactly what a complete answer covers - these are learning prompts, not gotchas.