build

How Anthropic Lets Agents Rewrite Their Own Tool Descriptions

TRIGGER

MCP servers expose agents to unseen tools with wildly varying description quality, and bad tool descriptions send agents down completely wrong paths—but manually testing and rewriting descriptions for every tool doesn't scale.

APPROACH

Anthropic created a tool-testing agent that, when given a flawed MCP tool, attempts to use the tool dozens of times and then rewrites the tool description to help future agents avoid failures. The agent diagnoses why failures occur and suggests improvements based on discovered nuances and bugs. Input: tool with original description + failure examples. Output: improved tool description. Results: 40% decrease in task completion time for future agents using the rewritten descriptions.

PATTERN

“Tools fail agents at the description layer, not the API layer. A working endpoint with a vague description is a broken tool. Let agents discover description gaps through usage; they catch what humans miss reading specs.”

✓ WORKS WHEN

Tools are used by agents autonomously without human tool selection
Tool descriptions are the primary interface for agent decision-making
Tools have non-obvious behaviors, edge cases, or failure modes
You have enough agent usage volume to surface description gaps
Improved descriptions can be propagated to future agent sessions

✗ FAILS WHEN

Humans select tools and agents just execute (description quality less critical)
Tool behavior is simple and obvious from the name
Tool failures are in the API itself, not agent misuse
Testing iterations are too expensive or slow to run dozens of times
Tool descriptions are externally controlled and can't be modified

Stage

build

Source

Anthropic Engineering →

From

June 2025