Stop babysitting your agents.
Engineer their execution.
Workflow runtime for coding agents. With stats.
brew install isomorphx/tap/gumpOpen sourceยทAgent-agnosticยทGit-native
๐ฟ gump run tdd --spec spec.md
โ Step 1: Decompose
โ Step 2: Build (3 items)
โ Step 3: Quality
โ Build: 3 items passed (2 retries, 1 escalation)
โ Quality: compile + lint + test
โ Cost: $1.42 ยท 12 turns ยท 47s
Every run produces a full trace โ what ran, what failed, what it cost.
BUILD YOUR OWN WORKFLOW
Define how agents work.
Each step follows three phases: GET the context, RUN the agent, GATE the result. Define it in YAML. Share it, version it, run it anywhere.
name: cheap2sota
description: Start with a cheap model, escalate to SOTA only on failure
max_budget: 8.00
steps:
- name: decompose
agent: claude-sonnet
output: plan
prompt: |
Decompose {spec} into independent items.
Each item must be implementable and testable in isolation.
gate: [schema]
- name: build
foreach: decompose
steps:
- name: impl
agent: qwen
output: diff
prompt: |
Implement: {item.description}
Files: {item.files}
guard:
max_turns: 60
gate: [compile, test]
on_failure:
retry: 5
strategy:
- same
- same
- "escalate: claude-haiku"
- "escalate: claude-sonnet"
- "escalate: claude-opus"
- name: quality
gate: [compile, lint, test]
ORCHESTRATE ANY AGENT
6 adapters today. More coming.
Model-agnostic by design.
Match the right tool to the right task.
Don't lock your workflow into a single ecosystem. Balance cost, speed, and reasoning by mixing specialized models.
Decompose with Opus. Implement with Qwen. Review with Gemini.
Gump orchestrates them all seamlessly.
VALIDATE EVERY STEP
Gates verify. Guards protect. Retries fix.
Every step passes through deterministic gates โ compile, test, lint, schema checks. No LLM in the loop. If a gate fails, Gump retries with the same agent, escalates to a stronger model, or restarts from an earlier step.
Live guards watch agents in real-time, cutting them off if they blow the budget or write where they shouldn't. Every run executes in an isolated Git worktree. Your main branch stays clean.
Resume a crashed run. Replay from any step.
Step 2: Build โ item 1/3
โ gate failed: test (claude-haiku)
retry 2/5 (same)
โ gate failed: test (claude-haiku)
retry 3/5 (escalate: claude-sonnet)
โ gate passed (claude-sonnet)
Step 3: Quality
โ compile + lint + test
Run completed. $1.42 ยท 47sKNOW WHAT YOUR AGENTS COST
Run: run_2026_03_23_1842
Workflow: tdd ยท Status: pass ยท Duration: 47s ยท Cost: $1.42
Step Agent Turns Cost Gate
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
decompose claude-opus 3 $0.31 schema โ
build/1/tests claude-haiku 8 $0.12 compile+test โ
build/1/impl claude-haiku 14 $0.18 compile+test โ
build/2/tests claude-haiku 6 $0.09 compile+test โ
build/2/impl claude-sonnet 12 $0.44 compile+test โ (escalated)
quality โ โ โ compile+lint+test โStructured execution data. Not reconstructed โ tracked live.
Every run produces a full event ledger โ cost per step, token usage, turns, retries, time-to-first-diff, context window usage. Tracked live from the agent stream, not reconstructed after the fact.
Gump is the only tool that knows which agent does which type of work, with which result, and how much it costs โ down to the individual step. Know when to escalate. Know what you're spending.
Every metric is a byproduct of execution โ not instrumentation.
Run it. Measure it. Ship it.
brew install isomorphx/tap/gump