# Multi-Agent Orchestration Patterns

> Vendor-neutral reference covering when multi-agent systems pay off and nine named patterns — from single-agent baseline through hierarchical and blackboard architectures — with tradeoffs, cross-cutting concerns, and a decision guide.

Category: Guide · Updated: 2026-06-15 · Tags: agents, multi-agent, orchestration, architecture, patterns, design
Canonical: https://changegamer.ai/resources/multi-agent-orchestration-patterns

Multi-agent systems add real cost, latency, and operational complexity. The decision to use one should be driven by a concrete failure or scaling need in a single-agent design, not by the appeal of the architecture. This reference covers when to go multi-agent, the nine canonical patterns (with tradeoffs), and the cross-cutting concerns every builder encounters.

## When to use multi-agent vs single-agent

Anthropic's "Building Effective Agents" (Schluntz & Zhang) frames agentic systems as either *workflows* — where LLMs and tools are orchestrated through predefined code paths — or *agents* — where the LLM dynamically directs its own process and tool use. Multi-agent adds a third dimension: multiple LLM-driven actors cooperating.

Add multi-agent only when one of these conditions holds:

- **Separable subtasks**: the problem cleanly decomposes into independent work units that do not require tight shared state. If subtasks are tightly coupled, coordination overhead outweighs the benefit.
- **Parallelism pays**: subtasks can run concurrently and the wall-clock gain justifies the added token fan-out cost. Anthropic's internal multi-agent research system outperformed a single Claude Opus 4 agent by 90.2% on breadth-first research queries where parallel exploration across many independent directions was the key differentiator.
- **Specialization**: different subtasks demand different system prompts, tool sets, or even models (e.g., a cheap fast model for routing, a large model for synthesis).
- **Context-window limits**: a task genuinely exceeds one context window and cannot be handled by summarization or retrieval alone.

The default should be: start single-agent with tools. Add structure only when a real failure or scaling need demands it.

## The nine patterns

### 1. Single-agent-with-tools (baseline)

One LLM with access to a toolset in a loop. The baseline against which all multi-agent patterns should be benchmarked. When to use: all tasks where the scope fits one context window and does not require parallel execution. Tradeoff: limited by context window; no parallelism.

### 2. Prompt chaining / sequential pipeline

Output of step N becomes input to step N+1; each step uses a focused prompt. Named in Anthropic's "Building Effective Agents" as *prompt chaining*. When to use: tasks that decompose naturally into ordered stages (draft → critique → refine; extract → classify → summarize). Tradeoff: errors propagate forward; latency is additive; no parallelism.

### 3. Routing (classifier dispatch)

A classifier step reads the input and routes it to the appropriate specialist agent or prompt. Named in Anthropic's "Building Effective Agents" as *routing*. When to use: handling diverse input types that each require different handling (customer service triage, language detection, intent classification). Tradeoff: classification errors send tasks to the wrong handler; requires maintaining multiple specialist configurations.

### 4. Parallelization (sectioning + voting)

Multiple agents work on the same problem simultaneously. Two sub-variants from Anthropic's "Building Effective Agents": *sectioning* (divide a task into parallel independent chunks) and *voting* (multiple agents independently solve the same task; majority or best answer wins). When to use: long documents that can be chunked, independent research threads, or high-stakes decisions where redundancy reduces error rate. Tradeoff: token fan-out — cost multiplies with the number of parallel agents; requires aggregation logic.

### 5. Orchestrator-workers

A lead orchestrator agent dynamically spawns, delegates to, and aggregates results from worker subagents. Named in Anthropic's "Building Effective Agents" as *orchestrator-workers* and demonstrated in their multi-agent research system (where the orchestrator plans the research strategy and spawns parallel search subagents). When to use: tasks with dynamic scope — the number and type of subtasks is not known in advance. Tradeoff: orchestrator becomes a single point of failure; inter-agent communication cost; harder to debug.

### 6. Evaluator-optimizer (generator + critic loop)

One agent generates a candidate output; a second evaluates it against a rubric and returns feedback; the generator revises. Loop repeats until the evaluator is satisfied or a termination condition is met. Named in Anthropic's "Building Effective Agents" as *evaluator-optimizer*. When to use: tasks with a verifiable quality criterion (code that must pass tests, text that must meet a rubric). Tradeoff: requires a reliable evaluator — a weak critic produces useless loops; loop count must be bounded (termination guard mandatory).

### 7. Hierarchical / manager-of-managers

A top-level orchestrator delegates to sub-orchestrators, each of which manages their own worker pool. Extends orchestrator-workers to multiple tiers. When to use: very large decomposable tasks where a single orchestrator would exceed context or coordination limits. Tradeoff: coordination overhead grows with depth; error propagation is harder to trace; observability becomes critical (see /resources/agent-observability).

### 8. Group chat / debate

Multiple agents participate in a shared conversation, each contributing from its own perspective or role. A moderator (human or LLM) synthesizes or selects the final output. Sometimes called *multi-agent debate*. When to use: tasks benefiting from adversarial review, brainstorming, or simulated stakeholder perspectives. Tradeoff: verbose; expensive; convergence is not guaranteed without a strong moderator or termination criterion.

### 9. Blackboard / shared state

Agents read from and write to a shared structured artifact (the "blackboard") — a document, database, or structured object — rather than passing messages directly. Each agent acts when its triggering conditions are met. When to use: long-running tasks where agents work asynchronously and on different parts of the same artifact (co-authoring, iterative document refinement). Tradeoff: write conflicts require locking or versioning; shared state is a single point of corruption if an agent writes bad data.

## Cross-cutting concerns

**Context and state sharing** — choose between shared memory (blackboard/database) and message passing. Shared memory enables tight coordination but requires conflict handling. Message passing is simpler to reason about but increases latency per hop.

**Handoffs vs delegation** — a handoff transfers full control (the calling agent stops); delegation keeps the orchestrator in control and aggregates results. Handoffs lose context; delegation multiplies context cost.

**Error propagation and partial failure** — in a multi-agent pipeline, a subagent failure can silently corrupt downstream results. Design explicit error contracts: subagents must return structured success/failure signals, not just text. The orchestrator must handle partial failure (retry, degrade, or surface the gap).

**Cost explosion (token fan-out)** — parallelization and orchestrator-workers multiply token spend. Model the cost before deploying: N parallel subagents at M tokens each costs N×M tokens. A 10-subagent orchestrator-workers pattern can be 10× more expensive than the single-agent baseline for the same task.

**Termination and loop guards** — evaluator-optimizer and group-chat patterns can loop indefinitely without a hard stop condition. Always set a maximum iteration count; prefer an evaluator that returns a structured `{pass: bool, feedback: string}` output so the loop can terminate deterministically.

**Observability** — multi-agent runs require a shared `trace_id` propagated across all subagent calls. Without it, cross-agent debugging is impossible. See /resources/agent-observability for the OpenTelemetry GenAI semantic conventions and tooling.

**Inter-agent trust and security** — subagents are not implicitly trusted. An agent receiving instructions from an orchestrator should apply the same prompt-injection and tool-abuse mitigations as it would for user input. For A2A delegation protocols and token audience binding across agents, see /resources/mcp-vs-a2a. For the full security checklist, see /resources/agentic-security-checklist.

## Decision guide

1. **Start single-agent.** Build a single LLM with the minimum toolset that could theoretically solve the task. Measure cost, latency, and success rate.
2. **Identify the concrete failure.** Is it a context-window limit? A parallelism gap? A quality problem that needs a critic? Identify one specific failure before adding structure.
3. **Apply the minimum pattern.** Prompt chaining before orchestrator-workers. Evaluator-optimizer before group chat. Each added tier multiplies complexity and cost.
4. **Add observability first.** Before scaling to multi-agent, instrument your single-agent run with traces. You will need those signals to debug the multi-agent version.

For which frameworks implement which patterns, see /resources/agent-frameworks-compared.

## Verified sources

- Anthropic — Building Effective Agents (Schluntz & Zhang): https://www.anthropic.com/research/building-effective-agents
- Anthropic — How we built our multi-agent research system: https://www.anthropic.com/engineering/multi-agent-research-system