All challenges
advancedobservabilityperformanceincident-response~15 min5 rounds

p99 Latency Spiked Across a Microservice Chain

p99 latency jumped across a request that touches six services. A teammate wants to scale up the slowest-looking one. Defend how you find the real culprit.

the decision you defend

Your p99 latency jumped on a user request that flows through six microservices. Average latency looks fine. A teammate points at the service with the highest CPU and says scale that one up. How do you actually find the culprit?

Sign in to startFree for everyone. Takes a few seconds.

the situation

Users report the app feeling slow. Your dashboards show p99 latency has jumped sharply on a key request that passes through about six microservices, while average latency looks normal.

context

You have per-service CPU and memory metrics and request logs, and distributed tracing is available. One service in the chain is running noticeably hotter on CPU than the others. A teammate wants to scale that hottest service up and see if it helps.

How this challenge works

Take a position on the decision above and defend it. A senior-engineer AI will push back over up to 5 rounds. When you are done, you are scored against a verified rubric so you can see exactly what a complete answer covers - these are learning prompts, not gotchas.