← Back to patterns
build

Anthropic's Dedicated Think Tool for Sequential Reasoning

TRIGGER

AI agents making sequential tool calls were failing to maintain policy compliance and making costly errors mid-chain—the model would retrieve information but act on it incorrectly because there was no structured space to verify constraints before each action.

APPROACH

Anthropic added a 'think' tool to Claude's tool set—a no-op tool that logs reasoning without affecting external state. Input: a thought string describing the agent's current reasoning. Output: nothing (the thought is appended to the conversation log). On τ-bench airline domain, the think tool with domain-specific prompting achieved 0.570 pass^1 vs 0.370 baseline (54% relative improvement). On τ-bench retail domain, think tool alone achieved 0.812 vs 0.783 baseline. On SWE-bench, the think tool contributed to a 1.6% improvement (statistically significant with p < .001, effect size d = 1.47). The optimized prompt included examples showing how to enumerate applicable rules, check required information, and verify policy compliance before acting.

PATTERN

Making "stop and think" a tool invocation forces a structural pause—the model must choose to reason before acting. Without this, reasoning and action blur together, and agents skip constraint checks mid-chain.

WORKS WHEN

  • Agent performs sequential tool calls where each step depends on previous results (not parallel/independent calls)
  • Environment has complex policies or constraints the agent must verify before acting (τ-bench airline policy had detailed baggage, cancellation, and payment rules)
  • Mistakes are costly and irreversible—you can't easily undo a wrong action
  • You can provide domain-specific examples of what good thinking looks like in your prompt
  • The additional output tokens for thinking are acceptable given the reliability gains

FAILS WHEN

  • Agent only needs single tool calls or parallel independent calls with no dependencies
  • Task has simple instruction following without multi-step policy verification
  • All necessary information is available upfront before any tool calls (use extended thinking instead)
  • Token budget is severely constrained and you can't afford the reasoning overhead
  • Domain is simple enough that the agent's default behavior already achieves acceptable accuracy

Stage

build

From

March 2025

Want patterns like this in your inbox?

3 patterns weekly. No fluff.