build

How Hugging Face Turned Failed Training Runs into Accuracy Gains

TRIGGER

RL training for code generation tasks wastes failed rollouts—when the model produces incorrect code, it receives zero reward and learns nothing from the specific failure, even though the verifier's error message contains actionable debugging information.

APPROACH

Kimina-Prover stores failed rollouts (prompt, response, and Lean feedback) and creates new training samples where the model is prompted to revise its previous reasoning/code based on the error. Only one error-fix turn is allowed, with error messages capped at a set token limit. At each training step, half the samples are error correction samples. Results: Pass@32 improved from 72.95% to 76.23% on MiniF2F for 1.7B model, with error correction adding another 1.64% (to 77.87%) at inference time.

PATTERN

“Verifier error messages paired with failed generations become supervised examples for self-correction. Half your training samples can be error-fix turns when the failure rate is high enough.”

✓ WORKS WHEN

Verifier produces structured, actionable error messages (type errors, tactic failures, assertion violations)
Error messages are concise enough to fit in context without dominating the prompt
Failure modes are recoverable—the error points to a fixable issue rather than fundamental approach problems
Training infrastructure supports storing and replaying failed rollouts as new samples
Task has high initial failure rate (>50%) providing abundant error correction examples

✗ FAILS WHEN

Verifier only provides pass/fail without diagnostic information (black-box evaluation)
Error messages are too verbose or noisy to extract signal (>500 tokens of stack traces)
Most failures stem from wrong approach rather than fixable bugs—model needs to restart, not patch
Training compute is constrained and error correction samples double the effective batch size
Task has low failure rate (<10%) leaving insufficient error examples to learn from

Stage

build

Source

Hugging Face →

From

August 2025