Failure Handling
When a gate fails, a guard triggers, or a review rejects the code, Gump needs to know what to do. Theon_failure field defines the recovery strategy.
Without on_failure
If a step has noon_failure, any failure is fatal — the run stops immediately.
Basic on_failure
retry— maximum number of additional attemptsstrategy— what to do on each attempt, in order
Strategy options
same
Retry with the same agent. The agent receives the error context ({error} and {diff}) from the failed attempt.
same: N (shorthand)
[same, same, same].
escalate: agent
Switch to a more powerful (usually more expensive) agent.Combining strategies
restart_from
Restart from an earlier step in the same group instead of retrying the current step:restart_from: tests goes back to the tests step. The worktree is reset to the pre-tests state and the state bag is cleaned (previous outputs moved to “prev”, not destroyed). Each restart consumes one attempt from the retry budget.
The reasoning: if the agent can’t implement what the tests demand, maybe the tests are poorly designed.
Conditional on_failure
Route failure handling differently depending on what failed:gate_fail— a gate (compile, test, lint) failedguard_fail— a guard (max_turns, max_budget, no_write) killed the agentreview_fail— a review step returnedpass: false
gate_fail. If gate_fail isn’t listed either, the failure is fatal.
The simple form and the conditional form are mutually exclusive.
What happens on retry
- The worktree is reset to the pre-step commit (
git reset --hard) - The session is fresh (unless
session: reuse-on-retry) {error}is injected with the gate’s stderr output (truncated to 2000 chars){diff}is injected with the failed attempt’s diff (truncated to 3000 chars)- The agent launches with the full error context
- If this attempt also fails, the next strategy entry is used
Circuit breaker
When all strategies are exhausted, the step is marked fatal. If the step is inside a group with its ownon_failure, the group’s retry kicks in. Otherwise, the run stops.
The circuit breaker emits a circuit_breaker event in the ledger with the reason and the number of attempts.
Group-level on_failure
Groups (orchestration steps) can have their ownon_failure:
on_failure handles per-step retries. If the group gate fails after all inner retries, the outer on_failure retries the entire group from impl. The worktree is reset to the pre-group state.