Choosing the Right Agent
Different agents have different strengths. Picking the right one for each step is the main lever for optimizing cost and quality.The trade-off
More capable agents cost more and are slower, but succeed more often on the first try. Cheaper agents are fast and affordable, but may need retries or escalation. Gump’s retry and escalation system lets you start cheap and pay more only when needed.Rules of thumb
Planning (output: plan)
Use a strong agent. The plan determines everything that follows — a bad decomposition wastes all downstream agent work. Claude Opus or Claude Sonnet are good choices. The cost of planning is small relative to the total run.Implementation (output: diff)
Start cheap. Qwen, Claude Haiku, or OpenCode handle most implementations if the plan is good and the blast radius is clear. Use escalation to handle the cases where they can’t.Review (output: review)
Use a strong agent with a fresh session. The reviewer should be at least as capable as the implementer — otherwise it can’t catch the implementer’s mistakes. Using a different provider (e.g., Gemini for review when Claude implemented) reduces shared blind spots.Artifact (output: artifact)
Depends on the task. For arbitration between reviews, use a strong agent (Claude Opus). For simple text generation, a cheaper agent works.Cost profiles
Rough ordering from cheapest to most expensive per token (subject to change with provider pricing):- OpenCode (cheapest)
- Qwen
- Claude Haiku
- Gemini Flash
- Claude Sonnet / Gemini Pro
- Claude Opus
Mixing agents
Gump is agent-agnostic by design. You can mix agents freely within a workflow:session: reuse requires both steps to use the same agent (session IDs are provider-specific).
Let the data decide
After a few runs, usegump report to see which agents succeed on which types of tasks. If Qwen fails 40% of the time on steps with a blast radius larger than 5 files, start with Claude Haiku for those cases. If Claude Sonnet always passes on the first try for your test-writing steps, you might not need escalation there.
This is the feedback loop: run → measure → adjust → run.