three agent architectures, one task, the simple one won

jun 2026

i benchmarked three agent designs on the same task. the simplest one won, and it wasn't close.

i built a flight-telemetry agent and couldn't decide how clever to make it. so i built three versions, gave them the same questions, and let the results pick.

the contenders were the usual ladder of sophistication. a single-shot setup that retrieves what it needs once and answers. a plan-and-execute version that drafts a plan and works through it. and the react loop, where the model reasons, calls a tool, reads the result, and goes around again. the last two are the ones you reach for when a task feels like it deserves an agent.

the simple one won. single-shot was more accurate and about a fifth the cost, and the two elaborate loops scored lower while spending far more, because every extra reasoning step is more tokens and another chance to wander off course. the loop didn't buy correctness here. it bought latency and a bigger bill.

the part that mattered wasn't the architecture, it was who computes the answer. in my setup the model never does the arithmetic. it only decides which tool to call. the tool is ordinary, validated code, and it returns the answer along with the source it came from. so the model's whole job shrinks to routing, and routing is something even a small, cheap model does reliably. the fancier loops were spending their budget re-deriving things a function could just return.

the tradeoff is real and worth stating. single-shot is weaker when a task genuinely needs multi-step planning you can't lay out in advance. mine didn't, and most of the agent tasks i've looked at since don't either. they get framed that way because the loop is the fun part to build.

the lesson i took is to make the model do less, not more. give it the smallest decision it can make reliably, hand the rest to code you can trust, and the system gets cheaper and more correct at the same time.