Post
Parallel Subagent Orchestration: From Monolithic Agents to Coordinated Workflows
Agentic AI is moving from monolithic loops to orchestrated multi-agent systems, and the shift is visible in production tooling.
Enjoying the field notes? Subscribe for each new deep dive.Subscribe →
Parallel Subagent Orchestration: From Monolithic Agents to Coordinated Workflows
Agentic AI is moving from monolithic loops to orchestrated multi-agent systems, and the shift is visible in production tooling.
Claude Code's Ultra mode—released as part of Anthropic's dynamic workflows feature in late May/early June 2026—combines extended reasoning (Claude thinking longer and harder) with automatic orchestration of parallel subagent workflows. Swyx (Smol.ai founder) called it "scarily good at burning tokens" but noted the critical requirement: "you need to set up your repo to parallelize properly to make use of the fanout that I think subagents are best at" (per @swyx on X, June 15, 2026).
The architectural pattern: instead of a single agent iterating through a long task, a primary agent acts as orchestrator, decomposes the problem into sub-tasks, spawns subagents with isolated contexts, and synthesizes their outputs. The payoff is parallelism, isolation, and determinism. The cost is token usage—subagent-heavy workflows can consume approximately 7x the tokens of single-thread sessions (per Nimbalyst)—and the requirement for deliberate workflow design.
What Are Subagents?
A subagent is a scoped AI worker spawned by the main Claude Code session to handle a specific task. It runs in its own context window with its own system prompt, tool access, and permissions (per Nimbalyst). Three properties define effective subagents:
- Isolation: Separate context window and system prompt.
- Single purpose: One job, finish, return, discard context.
- Parallelism: Multiple subagents can run concurrently.
Configuration is via markdown files with YAML frontmatter in ~/.claude/agents/ (user scope) or .claude/agents/ (project scope). Minimal example (per Nimbalyst):
---
name: code-reviewer
description: Reviews a recent diff and reports issues by severity.
tools: Read, Grep, Glob, Bash
---
You are a code reviewer. When invoked, read the most recent diff in the repo,
check for obvious bugs, security issues, and style problems, and return a
prioritized list with severity and file/line references. Be specific.
The name field is how the main session refers to this subagent. The description is critical: it's what the main agent reads when deciding whether to delegate. "Vague descriptions ('helps with code') get the subagent invoked at random. Specific descriptions ('reviews a recent diff and returns issues by severity') get it invoked exactly when appropriate" (per Nimbalyst).
When to Use Subagents
Best Use Cases (per Nimbalyst)
1. Parallel exploration (the single best use case) Understand a codebase from multiple angles simultaneously. Example: spawn four subagents to check config loading, test runs, build pipeline, and recent file activity. Run concurrently and consume summaries. This is where fanout shines.
2. Heavy-context tasks When a subtask would pull huge amounts of file content into the main session, use a subagent to read 50 files, find 3 relevant ones, and return only a summary. Main session stays sharp.
3. Specialized roles Named subagents acting as roles: "code reviewer," "test writer," "security auditor." Each has a focused prompt and tool list. Main session acts as coordinator.
4. Disposable work Debugging, exploration, dead-end investigations. Work likely to be discarded should run in a subagent. Discarding subagent context costs nothing.
When NOT to Use Subagents (per Nimbalyst)
1. Tightly coupled work: If two tasks need to share state, don't split them. Subagents cannot see each other's contexts.
2. Trivial subtasks: Subagents have startup cost. One-shot tool calls or trivial computations should stay on main thread.
3. Token-budget-sensitive sessions: "Subagents multiply token use. If you are running on a tight budget or near a rate limit, single-thread sessions are cheaper. Anthropic notes that subagent-heavy workflows can consume around 7x the tokens of a single-thread session" (per Nimbalyst).
Ultra Code Mode: Extended Reasoning + Automatic Orchestration
Ultra Code is Claude Code's highest-effort setting. It combines extended thinking (Claude reasons through problems step-by-step before producing output) with automatic multi-agent coordination (per MindStudio).
Extended Thinking in Practice
Extended thinking allows Claude to "reason through problems step-by-step before producing output. This internal reasoning process considers multiple implementation approaches before committing, identifies edge cases that quick passes would miss, reasons about cross-codebase dependencies, and plans action sequences that avoid dead ends" (per MindStudio).
Example: Database ORM migration. Standard mode starts making changes immediately, hits conflicts partway through, requires intervention. Ultra mode analyzes full migration scope first, identifies tightly coupled vs. easily swappable modules, plans sequenced migration path minimizing risk, and flags potential breaking changes before touching files (per MindStudio).
Dynamic Workflow Orchestration
The second defining feature of Ultra Code is automatic multi-agent coordination (per MindStudio):
- Primary agent acts as orchestrator: Breaks large tasks into sub-tasks.
- Sub-agents run in parallel: Each works on separate pieces simultaneously.
- Automatic decision-making: Claude determines when to spin up sub-agents and what to assign them.
- Context-aware task splitting: Assignments based on codebase structure (e.g., one agent for API layer, another for database, third for tests).
- Result synthesis: Orchestrator reviews outputs, resolves conflicts, integrates pieces.
| Feature | Standard/High Mode | Ultra Code Mode |
|---|---|---|
| Multi-agent orchestration | Available but requires manual setup | Automatic and dynamic |
| Parallelism | Manual planning needed | Automatic detection and setup |
| Task coordination | User-managed | AI-managed with conflict resolution |
Benefits: speed at scale, maintained coherence (reduces "too many cooks" integration problems), automatic coordination of shared context and state across sub-agents (per MindStudio).
When to Use Ultra Code (per MindStudio)
- Large-scale refactors touching dozens of files across multiple modules
- Complex architectural changes (database migrations, framework switches, API redesigns)
- Codebase-wide analysis (security vulnerabilities, performance bottlenecks, technical debt scanning)
- Multi-dependency feature implementation (changes spanning backend, frontend, database schema, tests)
- Unfamiliar codebases requiring deep understanding before action
Mental model: "If a task is the kind of thing a senior engineer would spend an hour planning before touching code, Ultra Code mode is probably the right tool" (per MindStudio).
Two Design Philosophies: Dynamic vs. Deterministic
Ultra mode's launch sparked a bifurcation in workflow design philosophy.
Dynamic Workflows
Claude generates the orchestration on-the-fly. Prompt: /deep-research on [topic], and Claude writes the workflow custom-built for the task at hand. This is the approach highlighted in most launch coverage (a contrast drawn by Reza Rezvani in "Claude Code Workflows: Build Deterministic Agent Runs," Medium, June 1, 2026).
Deterministic Workflows
Developer authors the orchestration as code—JavaScript scripts in .claude/workflows/—written once, committed to git, readable, validatable, and repeatable. The framing here—"a workflow in Claude Code is a JavaScript script that orchestrates subagents," where the loops, branches, and fan-out are ordinary code and only the leaf agent() calls spend tokens—is made explicit by Reza Rezvani in "Claude Code Workflows: Build Deterministic Agent Runs" (Medium, June 1, 2026).
The reported emphasis of this camp is on hand-authored scripts for repeatable, auditable multi-agent orchestration that teams can trust in production.
This is an architectural choice: dynamic workflows prioritize adaptability; deterministic workflows prioritize repeatability and trust.
Cost Structure & Optimization
Per CloudZero's analysis, agent costs scale roughly with the number of concurrent sessions:
- Solo dev, normal workflow: ~$13/day
- Dev with 3 parallel agents: ~$30-$40/day
- Dev running 5-10 agents: ~$50-$130/day
Rate limits apply: background sessions draw down subscription usage the same as interactive sessions, so running ten agents in parallel uses quota ten times faster (per CloudZero).
Five cost optimization strategies (per CloudZero):
- Tier agents by model: Opus for orchestrator (complex reasoning), Sonnet for workers (most tasks), Haiku for formatting/simple tasks. Impact: ~40% cost reduction vs. all-Opus.
- Stop agents when tasks complete: Idle sessions with live processes aren't free.
/stopin session or Ctrl+X in agent view. - Codify model limits in subagent YAML: Prevents developers from defaulting to most expensive model. Enforced by configuration, not willpower.
- Clear context between dispatches: Long-running agents accumulate stale history that gets resent on every turn. Use
/clearbetween tasks to drop accumulated context (CloudZero recommends this as a per-message token-cost lever; we were unable to verify a specific percentage figure for the savings). - Track per-agent spend: Organizational visibility prevents surprises.

Academic Foundations: Decentralized Multi-Agent Systems
Claude Code's subagent pattern connects to research on decentralized multi-agent systems (MAS). A June 2026 arXiv paper, "Decentralized Multi-Agent Systems with Shared Context" (arXiv:2606.10662v1), proposes DeLM (Decentralized Language Models)—agents coordinate asynchronously through a shared, verified context rather than relying on a central controller (per Mao & Mirhoseini, Stanford).

Reported results (per arXiv:2606.10662v1): - On SWE-bench Verified, DeLM is best across Avg.@1, Pass@2, and Pass@4, with gains of up to 10.5 percentage points over the strongest baseline and a cost-per-task reduction of approximately 50%. - On LongBench-v2 Multi-Doc QA, DeLM achieves the highest average accuracy across four frontier model families, improving over the strongest baseline by up to 5.7 percentage points.
These results are reported across multiple frontier models (the SWE-bench experiments use Gemini 3 Flash and Claude Opus 4.6).
Three mechanisms explain DeLM's success (per arXiv:2606.10662v1):
- Parallel agents complement each other by sharing failures: Failed hypotheses become reusable state rather than private dead ends. Example: Thread t0's negative result prevents t1 from repeating the same detour, improving search efficiency.
- Admitted constraints remain binding shared state: Constraints discovered by agents are preserved as shared state rather than being softened or reopened by a central controller.
- Compact patch summaries carry discoveries: Sharing compressed versions of useful discoveries reduces cost while preserving reusable results.
The shared verified context acts as a common communication substrate—once an update is admitted, it becomes visible to all agents as reusable problem state. This differs from scatter-gather patterns where a central orchestrator collects, merges, and rebroadcasts results, creating a communication bottleneck as subtask count grows.
Practical Subagent Patterns
Code review with parallel subagents: A common fanout pattern is to spawn several subagents, each focused on a specific aspect of code quality (security, performance, style, test coverage, error handling, documentation, type safety, API design, database queries), run them concurrently, and aggregate findings.
Parallel exploration workflow (per Nimbalyst):
Spawn three subagents in parallel: 1. Find every place where Database.connect is called. 2. Find every place where the connection pool is configured. 3. Find every place where errors from those calls are handled. Return when all three finish.Trade-offs: token cost is real, wall-clock saving on exploration tasks is large. Tight token budget → explore sequentially. Tight time budget → explore in parallel (per Nimbalyst).
Where This Goes
The shift from monolithic agent loops to orchestrated subagent systems represents a maturation of agentic AI:
- Parallelism becomes architectural, not optional.
- Orchestration logic becomes code, not just prompts.
- Determinism and repeatability become requirements for production.
- Cost management requires workflow design, not just model choice.
Swyx's observation about needing to "set up your repo to parallelize properly" is the signal: effective agentic systems require deliberate design of task decomposition, context isolation, and result synthesis. The tools now support it. The question is whether teams will architect for it.
Product Rescue lens: AI products fail when they treat the agent as a black box that gets smarter with better prompts. Shippable products treat the agent as an orchestrator in a multi-component system where task boundaries, context flow, and failure modes are explicitly designed.
Sources & further reading
- MindStudio: "What Is the Ultra Code Mode in Claude Code?": https://www.mindstudio.ai/blog/what-is-ultra-code-mode-claude-code
- CloudZero: "Claude Code Agents In 2026: Views, Subagents, Teams & Costs": https://www.cloudzero.com/blog/claude-code-agents/
- Nimbalyst: "Claude Code Subagents: A Practical 2026 Guide": https://nimbalyst.com/blog/claude-code-subagents-guide/
- Reza Rezvani (Medium): "Claude Code Workflows: Build Deterministic Agent Runs": https://alirezarezvani.medium.com/claude-code-workflows-build-deterministic-agent-runs-eaf2c6ac52d5
- Swyx on X (June 15, 2026): https://x.com/swyx/status/2066415484149633329
- "Decentralized Multi-Agent Systems with Shared Context" (arXiv:2606.10662v1): https://arxiv.org/html/2606.10662v1
Get the next deep dive in your inbox
Field notes on shipping agentic AI — no spam, unsubscribe anytime.