build

The Reasoning Timing Trap in Agentic Workflows

TRIGGER

Teams implementing AI agents faced a choice between two reasoning mechanisms—extended thinking (pre-response deep reasoning) and mid-response think tools—without clear guidance on when each applies.

APPROACH

Anthropic documented the distinction: extended thinking happens before response generation and works on information available in the initial query. The think tool happens during response generation and processes information discovered via tool calls. On τ-bench, extended thinking alone achieved 0.412 pass^1 in airline domain vs 0.332 baseline—similar to unprompted think tool at 0.404. For scenarios requiring analysis of tool outputs in long chains, think tool with prompting reached 0.570.

PATTERN

“Extended thinking reasons before tool calls return; think tool reasons after. Match reasoning timing to when critical information becomes available.”

✓ WORKS WHEN

Critical information comes from tool call results rather than the initial query (database lookups, API responses, file contents)
Agent needs to verify compliance or correctness after each step in a chain
You're building multi-turn agents where the model discovers constraints incrementally
Decisions depend on combining user request with external data retrieved mid-conversation

✗ FAILS WHEN

All information needed for the task is present in the initial user query—use extended thinking instead
Task is primarily analytical without tool use (coding, math, physics problems)
Agent makes single tool calls or parallel independent calls without sequential dependencies
You need deep upfront planning before taking any actions

Stage

build

Source

Anthropic Engineering →

From

March 2025