All challenges
advanceddatabasesreliabilitydisaster-recovery~14 min4 rounds

Failover Lost 5 Minutes of Writes. Promise Zero Data Loss?

An async-replica failover dropped five minutes of committed writes. Leadership demands zero data loss with zero performance impact. Defend the honest trade-off.

the decision you defend

Your primary database died and you failed over to an async replica. Five minutes of committed writes - orders customers got confirmations for - are gone. Leadership says: guarantee this never happens again, with zero data loss and zero performance impact. What do you tell them, and what do you actually build?

Sign in to startFree for everyone. Takes a few seconds.

the situation

At 02:10 the primary database's host died. Failover promoted the async replica and the app came back at 02:24. Then support tickets started: customers have order-confirmation emails for orders that do not exist. The replica was about five minutes behind at the moment of failure, and every write in that window - roughly 1,900 transactions, including around 140 paid orders - is gone.

context

Replication has always been asynchronous; it was set up years ago and nobody revisited it. The morning after, the CTO puts it simply: "Committed means committed. I want a guarantee this can never happen again - zero data loss - and it obviously cannot slow the product down, we compete on speed." You are asked to come back tomorrow with the plan.

How this challenge works

Take a position on the decision above and defend it. A senior-engineer AI will push back over up to 4 rounds. When you are done, you are scored against a verified rubric so you can see exactly what a complete answer covers - these are learning prompts, not gotchas.