Skip to main content

Documentation Index

Fetch the complete documentation index at: https://gump.build/docs/llms.txt

Use this file to discover all available pages before exploring further.

How Gump Works

Gump is a workflow runtime for coding agents — think GitHub Actions, but the jobs are AI agents instead of CI scripts. You describe what you want in a spec, pick a workflow, and Gump handles the rest: running agents, validating results, retrying on failure, and tracking every metric.
spec → workflow → GET → RUN → GATE → data
These are the building blocks. The rest of the docs explains each one.

Spec

A spec is a markdown file describing what you want built. A feature request, a bug report, a refactoring plan. Gump doesn’t parse it — it passes it to agents as context.
gump run cheap2sota --spec spec.md

Workflow

A workflow is a YAML file that defines a sequence of steps. Each step has a type that determines its contract — code writes code, split decomposes a spec into tasks, validate produces a verdict. Workflows are declarative: you describe what should happen, Gump figures out the execution.
name: cheap2sota
max_budget: 5.00

steps:
  - name: implement
    type: code
    get:
      prompt: "{spec}"
    run:
      agent: qwen
      guard:
        max_turns: 60
    gate: [compile, test]
    retry:
      - attempt: 3
        agent: claude-opus
      - exit: 5

  - name: quality
    gate: [compile, lint, test]
Start with Qwen (cheap), escalate to Opus if gates fail. Two steps, zero ceremony. Gump ships with 7 built-in workflows — you can also write your own.

The execution model: GET → RUN → GATE

Every step follows three phases: GET — Assemble everything the agent needs. Prompt, context files, worktree, session. Gump fetches and prepares; the agent receives. RUN — Launch the agent and monitor it via guards (circuit breakers). The agent works. Gump observes and protects. The result enters the state. GATE — Evaluate the result. Deterministic (compile, test, lint, bash script) or non-deterministic (a workflow validator backed by an agent). Produces a bool. Pass or fail. If the gate passes, the step is done. If the gate fails, the retry rules determine what happens next — relaunch the step from GET with updated state and optional overrides. The engine makes zero LLM calls. All intelligence comes from the agents it orchestrates, not from Gump itself.

Agents

Agents are coding CLI tools — Claude Code, Codex, Gemini CLI, Qwen CLI, OpenCode, Cursor. Each runs in headless mode inside an isolated worktree. Gump prepares their context, launches them, streams their activity, and collects the result. Gump doesn’t constrain what agents can do inside a step. It frames their work (prepares input, validates output, kills if necessary) without degrading their intelligence.

Gates and Guards

Gates are checks that run after a step completes — compile, test, lint, schema validation, or a workflow validator (an agent that reviews the code). If a gate fails, Gump can retry with the same agent, escalate to a stronger model, or mark the step fatal. Guards are live circuit breakers that watch the agent during execution. If an agent exceeds its turn budget, blows its cost limit, runs too long, or writes files it shouldn’t, the guard kills it immediately. Gates verify after. Guards protect during.

Data

Every run produces structured execution data — an event ledger in NDJSON format. Cost per step, tokens, turns, retries, gate results, duration. Tracked live from the agent stream, not reconstructed after the fact. This data is the foundation for optimizing your workflows. Which agent fails on which type of task? When should you escalate? How much are you spending?
spec → step → agent → attempt → metrics

Vocabulary

These terms appear throughout the documentation:
  • Workflow — YAML file defining a sequence of steps.
  • Run — A complete execution of a workflow (gump run).
  • Step — A unit of work with a typed contract (code, split, validate).
  • Type — Determines what the step outputs: code → diff, split → tasks, validate → bool.
  • Task — An element produced by a split step. Feeds into an each block.
  • Gate — Post-execution check. Deterministic (compile, test) or workflow validator (agent review).
  • Guard — Live circuit breaker during agent execution (max_turns, max_budget, max_time, no_write).
  • State — Key-value store where steps write their outputs. How steps communicate.
  • Retry — Ordered list of rules for what to do when a gate fails.
  • Ledger — NDJSON event log of a run. The audit trail.
  • Playbook — Collection of available workflows (built-in + project + user).