build

How Anthropic Writes Tool Descriptions as Mistake-Specific Guardrails

TRIGGER

Models were misusing tools in predictable ways—misunderstanding specs, using wrong parameter formats, or falling into common pitfalls—leading to failed tool calls and wasted turns in agentic loops.

APPROACH

Anthropic's SWE-bench team spent more time optimizing tool descriptions than the overall prompt, achieving state-of-the-art 49% on SWE-bench Verified. They tested tools to uncover model misunderstandings, then edited descriptions to preempt those errors. Input: observed failure patterns from testing. Output: description text that explicitly prevents those failures. For the Bash tool, the description notes that command contents don't need XML escaping, there's no internet access, and how to run background commands with "&". For the Edit tool (str_replace_editor), they require absolute paths after observing models mess up relative paths when the agent moved out of root directory—this change alone eliminated an entire class of errors. The string replacement tool specifies that old_str must match exactly one location, with clear error messages when there are zero or multiple matches.

PATTERN

“Models misuse tools in predictable ways—wrong path formats, missing escapes, incorrect parameter types—and small error rates compound across agentic turns. Tool descriptions aren't documentation; they're guardrails. Run the tool 10+ times, observe failure modes, then encode "do not do X" in the description.”

✓ WORKS WHEN

You have the ability to iterate on tool descriptions based on observed model behavior
Tools will be used across many agentic turns where small error rates compound
Failure modes are consistent enough to be preempted with description changes
The tool interface has inherent ambiguities (relative vs absolute paths, escaping rules)
You're building for a specific model family whose error patterns you can characterize

✗ FAILS WHEN

Tool usage is one-shot where iteration isn't possible
The model's error patterns are unpredictable or vary widely across invocations
Tool interface is already unambiguous and self-documenting
Description length is constrained and you can't fit the necessary detail
You're building for multiple model families with different failure modes

Stage

build

Source

Anthropic Engineering →

From

January 2025