How Anthropic Lets Agents Rewrite Their Own Tool Descriptions
TRIGGER
MCP servers expose agents to unseen tools with wildly varying description quality, and bad tool descriptions send agents down completely wrong paths—but manually testing and rewriting descriptions for every tool doesn't scale.
APPROACH
Anthropic created a tool-testing agent that, when given a flawed MCP tool, attempts to use the tool dozens of times and then rewrites the tool description to help future agents avoid failures. The agent diagnoses why failures occur and suggests improvements based on discovered nuances and bugs. Input: tool with original description + failure examples. Output: improved tool description. Results: 40% decrease in task completion time for future agents using the rewritten descriptions.
PATTERN
“Tools fail agents at the description layer, not the API layer. A working endpoint with a vague description is a broken tool. Let agents discover description gaps through usage; they catch what humans miss reading specs.”
✓ WORKS WHEN
- Tools are used by agents autonomously without human tool selection
- Tool descriptions are the primary interface for agent decision-making
- Tools have non-obvious behaviors, edge cases, or failure modes
- You have enough agent usage volume to surface description gaps
- Improved descriptions can be propagated to future agent sessions
✗ FAILS WHEN
- Humans select tools and agents just execute (description quality less critical)
- Tool behavior is simple and obvious from the name
- Tool failures are in the API itself, not agent misuse
- Testing iterations are too expensive or slow to run dozens of times
- Tool descriptions are externally controlled and can't be modified