Skip to main content

Reports & Metrics

Gump tracks structured metrics for every run. Use gump report to see them.

Basic report

gump report
Run: run_2026_03_24_1042
Workflow: tdd · Status: pass · Duration: 2m13s · Cost: $2.41

Step            Agent           Turns  Cost    Gate
───────────────────────────────────────────────────────
decompose       claude-opus       3   $0.31   schema ✓
build/1/tests   claude-haiku      8   $0.12   compile+test ✓
build/1/impl    claude-haiku     14   $0.18   compile+test ✓
build/2/tests   claude-haiku      6   $0.09   compile+test ✓
build/2/impl    claude-sonnet    12   $0.44   compile+test ✓ (escalated)
quality         —                 —   —       compile+lint+test ✓

Detailed step view

Drill into a specific step:
gump report --detail build/2/impl
This shows the step’s attempts (including failed ones), state bag entries, files changed, tokens breakdown, and guard activity.

Comparing runs

gump report --last 5
Shows a summary table of your last 5 runs — workflow, status, duration, cost, retries. Useful for spotting trends: is a particular workflow getting more expensive? Are retries increasing?

What Gump measures

High confidence (100%)

These are deterministic and exact:
  • Git diff (files changed, lines added/removed)
  • Gate exit codes (pass/fail)
  • Wall-clock duration per step and per run
  • Number of retries and escalations
  • Guard triggers

Medium confidence (~80%)

Tracked live from the agent stream, but subject to provider accuracy:
  • Tokens (input, output, cache read, cache write)
  • Cost (estimated from token counts and provider pricing)
  • Timestamps from the agent’s native events
  • Context window usage per step

Reconstructed (~50%)

Inferred from heuristics, not directly reported by all providers:
  • Turn count (cognitive cycles)
  • Turn classification (coding, execution, exploration, planning)
  • Shell command classification for Codex

Key metrics

Cost

Tracked per step, per item, per run. Aggregated from agent token reports. The estimate is based on published provider pricing — actual bills may differ slightly.

TTFD (Time To First Diff)

The time between agent launch and the first file modification. A long TTFD suggests the agent spent time exploring or planning before writing code. A useful signal for prompt optimization.

Stall detection

Gump detects agents that spin in circles:
  • tool_error_count — repeated tool failures
  • correction_loops — edit → test → fail → edit cycles
  • fatal_loops — the agent hits the same error repeatedly
  • repeated_action_loops — identical tool calls in sequence
These counters appear in gump report --detail and help diagnose why a step needed many turns or failed.

Context window usage

How much of the agent’s context window was consumed per step. High usage suggests the prompt or accumulated context is too large, which can degrade agent performance.

Analytics model

Gump organizes data at four levels: EventTurnStepRun Each event carries a timestamp and semantic labels. Turns are classified by their dominant activity (coding, exploration, execution). Steps aggregate turns. Runs aggregate steps. You can analyze at any level.