All challenges
advancedkubernetesnetworkingincident-response~15 min5 rounds

Service A Intermittently Cannot Reach Service B

One in twenty calls between two services fails. A teammate wants to add retries and move on. Defend how you actually find the cause.

the decision you defend

Service A intermittently fails to reach Service B inside the cluster - roughly one call in twenty times out, the rest are fine. A teammate says just add retries and move on. How do you actually diagnose this?

Sign in to startFree for everyone. Takes a few seconds.

the situation

Service A calls Service B many times a second over the cluster network. Most calls succeed, but roughly one in twenty times out. There is no obvious pattern in the application logs.

context

Both services run as Deployments with several replicas behind ClusterIP Services. The cluster uses CoreDNS and has NetworkPolicies in place. A teammate, seeing that the failure rate is low, wants to wrap the call in a retry and consider it handled.

How this challenge works

Take a position on the decision above and defend it. A senior-engineer AI will push back over up to 5 rounds. When you are done, you are scored against a verified rubric so you can see exactly what a complete answer covers - these are learning prompts, not gotchas.