build

How Anthropic Reduced Context Pollution with Code-Routed Tool Results

TRIGGER

Multi-step agent workflows were polluting context with intermediate results the model didn't need—fetching 2,000+ expense line items to answer 'who exceeded budget?' meant all raw data entered context even though only 2-3 names mattered for the final answer. Each tool call also required a full inference pass, compounding latency.

APPROACH

Anthropic implemented Programmatic Tool Calling where Claude writes Python orchestration code instead of requesting tools individually. Tools marked with `allowed_callers: ['code_execution']` execute in a sandboxed environment; their results go to the script, not Claude's context. Only the script's final output (stdout) enters context. Input: Claude generates code like `expenses = await asyncio.gather(*[get_expenses(m['id']) for m in team])` with filtering logic. Output: just the computed result (e.g., 1KB of budget violations instead of 200KB of raw expense data). Results: 37% token reduction (43,588 → 27,297 on complex research tasks); knowledge retrieval improved 25.6% → 28.5%; GIA benchmarks improved 46.5% → 51.2%.

PATTERN

“200KB of expense records entering context when you only need three names. Route tool results through code that filters to conclusions. The model should see conclusions, not evidence.”

✓ WORKS WHEN

Processing datasets where only aggregates or summaries matter (not raw records)
Multi-step workflows with 3+ dependent tool calls
Intermediate results shouldn't influence reasoning (e.g., raw logs, bulk records)
Operations can run in parallel across many items (checking 50 endpoints, fetching N user records)
Tool outputs are large but final answer is small (200KB → 1KB reduction pattern)

✗ FAILS WHEN

Claude should see and reason about all intermediate results (debugging, auditing)
Simple single-tool invocations where code overhead exceeds benefit
Quick lookups with small responses (<1K tokens)
Tool results require subjective interpretation rather than programmatic filtering
Orchestration logic is too complex to express reliably in generated code

Stage

build

Source

Anthropic Engineering →

From

November 2025