Skip to main content

How Gump Works

Gump is a workflow runtime for coding agents. You describe what you want in a spec, pick a workflow, and Gump handles the rest — decomposing the work, running agents, validating results, retrying on failure, and tracking every metric. Here’s the mental model in one diagram:
spec → workflow → engine → agents → gates → data
Six concepts. That’s it.

Spec

A spec is a markdown file describing what you want built. A feature request, a bug report, a refactoring plan. Gump doesn’t parse it — it passes it to agents as context.
gump run spec.md --workflow tdd

Workflow

A workflow is a YAML file that defines a sequence of steps. Each step runs an agent, validates the result, and decides what happens next. Workflows are declarative — you describe what should happen, Gump figures out the execution.
name: cheap2sota
steps:
  - name: decompose
    agent: claude-sonnet
    output: plan
  - name: build
    foreach: decompose
    steps:
      - name: impl
        agent: qwen
        gate: [compile, test]
Gump ships with 8 built-in workflows. You can also write your own.

Engine

The engine reads the workflow, creates an isolated Git worktree, and executes each step in order. It manages the lifecycle: launching agents, collecting outputs, running gates, handling failures, and snapshotting progress after every step. The engine makes zero LLM calls. All intelligence comes from the agents it orchestrates, not from Gump itself.

Agents

Agents are coding CLI tools — Claude Code, Codex, Gemini CLI, Qwen CLI, OpenCode, Cursor. Each runs in headless mode inside the worktree. Gump prepares their context (the prompt, project files, previous outputs), launches them, streams their activity, and collects the result. Gump doesn’t constrain what agents can do inside a step. It frames their work (prepares input, validates output, kills if necessary) without degrading their intelligence.

Gates and Guards

Gates are deterministic checks that run after a step completes — compile, test, lint, schema validation. No LLM in the loop. If a gate fails, Gump can retry with the same agent, escalate to a stronger model, or restart from an earlier step. Guards are live breakers that watch the agent during execution. If an agent exceeds its turn budget, blows its cost limit, or writes files it shouldn’t, the guard kills it immediately. Gates verify after. Guards protect during.

Data

Every run produces structured execution data — an event ledger in NDJSON format. Cost per step, tokens, turns, retries, gate results, duration. Not reconstructed after the fact. Tracked live, from the agent stream. This data is the foundation for optimizing your workflows. Which agent fails on which type of task? When should you escalate? How much are you spending?
spec → step → agent → attempt → metrics

Vocabulary

These terms appear throughout the documentation:
  • Workflow — YAML file defining a sequence of steps.
  • Run — A complete execution of a workflow (gump run).
  • Step — A unit of work. Can be an agent step, a gate step, an orchestration step, or a workflow step.
  • Item — An element produced by a plan step. Feeds into a foreach.
  • Gate — Deterministic check after a step (compile, test, lint). No LLM.
  • Guard — Live breaker during agent execution (max_turns, max_budget, no_write).
  • State Bag — Key-value store where steps write their outputs. How steps communicate.
  • Ledger — NDJSON event log of a run. The audit trail.
  • Playbook — Collection of available workflows (built-in + project + user).