The Queue Backlog Doubles Every Day. 10x the Consumers Tonight?
A consumer queue backlog is doubling daily and a teammate wants to 10x the consumer fleet tonight. Defend diagnosing the real bottleneck first, and why blind consumer scaling can make the outage worse.
the decision you defend
An order-processing queue backlog has grown from 20k to 160k messages over three days and is doubling daily. A teammate wants to 10x the consumer count tonight to burn it down before customers notice. Do you scale the consumers now, or do something else first? Defend your call.
the situation
Your order-processing pipeline pushes each new order onto a queue, and a fleet of 20 consumers picks them up to charge payment, reserve inventory, and send confirmation emails. Three days ago the backlog started growing: 20k, then 40k, then 80k, and this morning it crossed 160k messages. Inbound order volume is normal for the season. Oldest-message age is now over six hours and support is seeing "where is my confirmation" tickets.
context
The consumers write to the same Postgres instance as the main app and call a payment provider that rate-limits per account. Nobody has yet looked at per-consumer processing time, redelivery counts, or downstream latency; there is a dead-letter queue configured but its alarm was never wired up. A recent deploy touched the inventory-reservation code path. In the incident channel a teammate writes: "consumers are stateless, let us just set the autoscaling group to 200 tonight and the backlog is gone by morning."
How this challenge works
Take a position on the decision above and defend it. A senior-engineer AI will push back over up to 4 rounds. When you are done, you are scored against a verified rubric so you can see exactly what a complete answer covers - these are learning prompts, not gotchas.