build

Anthropic's Test-Driven Iteration Loop for Agentic Coding

TRIGGER

AI code generation produces plausible-looking code that may not actually work—without a feedback signal, the agent can't distinguish between code that compiles and code that behaves correctly, leading to subtle bugs that surface later.

APPROACH

Anthropic's internal workflow: (1) Ask Claude to write tests based on expected input/output pairs, explicitly stating you're doing TDD so it avoids creating mock implementations, (2) Tell Claude to run tests and confirm they fail—explicitly forbid implementation code at this stage, (3) Commit the tests when satisfied, (4) Ask Claude to write code that passes tests without modifying tests, instructing it to keep going until all tests pass—it iterates through write code → run tests → adjust code → run tests again, (5) Optionally use independent subagents to verify implementation isn't overfitting to tests, (6) Commit code when satisfied.

PATTERN

“The agent has no internal signal for "this works" versus "this compiles." Write failing tests first, then instruct the agent to iterate until green. Binary pass/fail turns code generation from "get it right once" into "search until verified."”

✓ WORKS WHEN

Behavior is easily verifiable with unit, integration, or end-to-end tests
Expected input/output pairs can be defined upfront before implementation
Test execution is fast enough to support multiple iteration cycles
The problem domain has deterministic expected outputs (not subjective quality)

✗ FAILS WHEN

Correct behavior is subjective or requires human judgment (UX, copy, design)
Test setup requires extensive mocking of systems the agent can't access
Expected outputs aren't known upfront and emerge during implementation
Tests themselves are the uncertain part (unclear requirements, exploratory work)

Stage

build

Source

Anthropic Engineering →

From

April 2025