← Back to patterns
build

Anthropic's Reproduce-First Approach to Code Agent Debugging

TRIGGER

Coding agents attempting to fix bugs would jump directly to modifying source code based on issue descriptions, but without a way to verify their fix actually resolved the problem, they would submit incorrect solutions or keep making changes without knowing if they were making progress.

APPROACH

Anthropic's SWE-bench agent prompt explicitly instructs the model to create a reproduction script before attempting fixes: 'Create a script to reproduce the error and execute it with python <filename.py>... to confirm the error.' After making code changes, the model reruns the reproduction script to verify the fix. In the example shown, the model created reproduce_error.py, confirmed the TypeError, made the fix, then re-verified.

PATTERN

Without a reproduction script, your agent cannot tell "I fixed it" from "I think I fixed it." Require the agent to create and run a reproduction script before any code changes. The script is ground truth; everything else is the agent guessing.

WORKS WHEN

  • The issue description contains reproducible steps or error conditions
  • The error manifests in a way that can be checked programmatically
  • The execution environment allows running arbitrary test scripts
  • Bugs are behavioral (wrong output, exceptions) rather than subtle (race conditions, memory leaks)
  • The reproduction script runs quickly enough to iterate on (<30 seconds)

FAILS WHEN

  • Issues are about code quality, style, or architecture rather than behavior
  • Reproduction requires complex environment setup the agent can't automate
  • The bug is intermittent or timing-dependent and hard to trigger reliably
  • Issue descriptions are vague and don't specify expected vs actual behavior
  • Verification requires human judgment (UI appearance, UX quality)

Stage

build

From

January 2025

Want patterns like this in your inbox?

3 patterns weekly. No fluff.