Stop babysitting your agents.
Engineer their execution.

Workflow runtime for coding agents. With stats.

brew install isomorphx/tap/gump

Open source·Agent-agnostic·Git-native

🍿 gump run tdd --spec spec.md

→ Step 1: Decompose

→ Step 2: Build (3 items)

→ Step 3: Quality

✓ Build: 3 items passed (2 retries, 1 escalation)

✓ Quality: compile + lint + test

✓ Cost: $1.42 · 12 turns · 47s

Every run produces a full trace — what ran, what failed, what it cost.

BUILD YOUR OWN WORKFLOW

Define how agents work.

Each step follows three phases: GET the context, RUN the agent, GATE the result. Define it in YAML. Share it, version it, run it anywhere.

name: cheap2sota
description: Start with a cheap model, escalate to SOTA only on failure
max_budget: 8.00

steps:
  - name: decompose
    agent: claude-sonnet
    output: plan
    prompt: |
      Decompose {spec} into independent items.
      Each item must be implementable and testable in isolation.
    gate: [schema]

  - name: build
    foreach: decompose
    steps:
      - name: impl
        agent: qwen
        output: diff
        prompt: |
          Implement: {item.description}
          Files: {item.files}
        guard:
          max_turns: 60
        gate: [compile, test]
        on_failure:
          retry: 5
          strategy:
            - same
            - same
            - "escalate: claude-haiku"
            - "escalate: claude-sonnet"
            - "escalate: claude-opus"

  - name: quality
    gate: [compile, lint, test]

ORCHESTRATE ANY AGENT

Codex

Claude Code

Gemini CLI

Cursor

Qwen CLI

OpenCode

6 adapters today. More coming.

Model-agnostic by design.
Match the right tool to the right task.

Don't lock your workflow into a single ecosystem. Balance cost, speed, and reasoning by mixing specialized models.

Decompose with Opus. Implement with Qwen. Review with Gemini.

Gump orchestrates them all seamlessly.

VALIDATE EVERY STEP

Gates verify. Guards protect. Retries fix.

Every step passes through deterministic gates — compile, test, lint, schema checks. No LLM in the loop. If a gate fails, Gump retries with the same agent, escalates to a stronger model, or restarts from an earlier step.

Live guards watch agents in real-time, cutting them off if they blow the budget or write where they shouldn't. Every run executes in an isolated Git worktree. Your main branch stays clean.

Resume a crashed run. Replay from any step.

Step 2: Build — item 1/3
✗ gate failed: test (claude-haiku)
  retry 2/5 (same)
✗ gate failed: test (claude-haiku)
  retry 3/5 (escalate: claude-sonnet)
✓ gate passed (claude-sonnet)

Step 3: Quality
✓ compile + lint + test

Run completed. $1.42 · 47s

KNOW WHAT YOUR AGENTS COST

Run: run_2026_03_23_1842
Workflow: tdd · Status: pass · Duration: 47s · Cost: $1.42

Step            Agent           Turns  Cost    Gate
───────────────────────────────────────────────────────
decompose       claude-opus       3   $0.31   schema ✓
build/1/tests   claude-haiku      8   $0.12   compile+test ✓
build/1/impl    claude-haiku     14   $0.18   compile+test ✓
build/2/tests   claude-haiku      6   $0.09   compile+test ✓
build/2/impl    claude-sonnet    12   $0.44   compile+test ✓ (escalated)
quality         —                 —   —       compile+lint+test ✓

Structured execution data. Not reconstructed — tracked live.

Every run produces a full event ledger — cost per step, token usage, turns, retries, time-to-first-diff, context window usage. Tracked live from the agent stream, not reconstructed after the fact.

Gump is the only tool that knows which agent does which type of work, with which result, and how much it costs — down to the individual step. Know when to escalate. Know what you're spending.

Every metric is a byproduct of execution — not instrumentation.

Run it. Measure it. Ship it.