Chapter 49 of 75

Orchestration — Multi-Agent Systems

Single agents hit hard limits: context windows fill, specialization is impossible, one reasoning chain gives you one perspective. Multi-agent orchestration solves these problems — but introduces failure modes that don't exist in single-agent systems.

5 min read

Part V — Agentic Systems

Orchestration — Multi-Agent Systems

A single agent operating within a single context window has hard limits: the context window fills, the agent cannot simultaneously specialize in multiple domains, and a single reasoning chain produces a single perspective. Multi-agent systems address these limits by decomposing complex tasks across multiple specialized agents, each operating within its competence domain, coordinated by an orchestrator that manages their collaboration. Orchestration is the discipline that makes this coordination reliable.

49.1 Why Multi-Agent Systems

Context window limits. A complex research and synthesis task may require more information than fits in any single context window. Multi-agent systems decompose the task so each agent operates within its context limit: one agent retrieves and analyzes source A, another analyzes source B, an orchestrator synthesizes the results.

Specialization. Different tasks benefit from different system prompt configurations, tool access patterns, and reasoning approaches. A specialized research agent, configured for web search and source evaluation, performs better at research tasks than a generalist agent configured for everything. Multi-agent systems allow each component to be optimized for its specific function.

Parallelism. Independent sub-tasks can be executed by different agents simultaneously rather than sequentially. A task that requires ten independent research queries can complete in the time of one query if ten agents work in parallel.

Independent verification. Critical outputs can be verified by a second agent that independently reasons about the same problem. Two agents reaching the same conclusion provides more confidence than one.

Separation of concerns. Decomposing a complex system into specialized agents with defined interfaces is a software engineering principle — the same principle that motivates microservices. Multi-agent systems apply this principle to AI: each agent has a defined role, defined inputs, and defined outputs.

49.2 Orchestration Patterns

Centralized orchestration. A single orchestrator agent controls all other agents. The orchestrator receives the goal, decomposes it into sub-tasks, assigns sub-tasks to worker agents, receives results, and synthesizes the final output. The orchestrator has a global view of the task state and makes all routing decisions.

Centralized orchestration is the simplest pattern to understand and debug — all coordination logic is in one place. It is appropriate when task decomposition is the primary challenge and when the orchestrator's context window is sufficient to hold the overall task state.

Decentralized orchestration. Agents communicate directly with each other without a central coordinator. Each agent knows which other agents it can invoke and under what conditions. The system converges on a final output through the chain of agent interactions.

Decentralized orchestration is more resilient — no single point of failure — but harder to debug because the task state is distributed across multiple agents. Appropriate for tasks where different parts require genuinely different routing logic that cannot be centralized without producing a god-object orchestrator.

Hierarchical orchestration. Multiple levels of orchestration: a top-level orchestrator decomposes the goal into major phases; phase-level orchestrators decompose phases into tasks; task-level agents execute individual tasks. The hierarchy allows each orchestrator to operate within a manageable complexity scope.

Hierarchical orchestration scales to the most complex tasks but introduces communication overhead between levels. Appropriate for enterprise AI systems that handle business processes with multiple distinct phases, each of which contains multiple distinct steps.

49.3 Agent Communication and State Management

Agents in a multi-agent system communicate through messages — structured data passed from one agent to another. Communication design affects system reliability, debuggability, and performance.

Message format: Define explicit message schemas for agent-to-agent communication. Unstructured natural language messages between agents are hard to validate and produce unpredictable behavior when one agent misinterprets another's output. Structured messages (JSON schemas with defined fields) enable validation and produce more reliable agent-to-agent communication.

Shared state: For tasks where multiple agents need to read and write common state (a research database, a document being collaboratively authored, a task completion tracker), shared state must be managed carefully. Race conditions — two agents attempting to modify the same state simultaneously — produce inconsistency. Implement locking or event sourcing for shared state in multi-agent systems.

Task queues: Distribute work to worker agents via task queues rather than direct invocations. Task queues provide backpressure (preventing overload of any single agent), retry semantics (failed tasks are retried automatically), and visibility into task completion status.

Result aggregation: The orchestrator must aggregate results from multiple worker agents into a coherent output. Define the aggregation logic explicitly — how are conflicting results from different agents resolved? How are partial results from failed workers handled? What constitutes a sufficient result set to proceed?

49.4 Multi-Agent Failure Modes

Multi-agent systems fail in ways that single agents do not.

Cascading failures. A failure in one agent propagates to other agents that depend on its output. If a research agent fails to retrieve required information, all downstream agents that needed that information may also fail. Cascading failures can silently invalidate large portions of a multi-agent task.

Coordination deadlocks. Agents waiting for each other — Agent A waiting for Agent B's result, Agent B waiting for Agent A's result — produce deadlocks that halt the system without error. Define timeout thresholds for every agent dependency and implement deadlock detection.

Context loss between agents. When an orchestrator passes a sub-task to a worker agent, it must pass sufficient context for the worker to complete the task correctly. Context loss — omitting key information that the worker needs — produces worker outputs that are technically correct for the information provided but wrong for the actual task.

Inconsistent state. When multiple agents modify shared state without coordination, the resulting state may be internally inconsistent. This is particularly problematic for agents that write to documents or databases: if two agents make concurrent modifications based on their own reasoning, the result may satisfy neither agent's intent.

Communication loop amplification. In decentralized systems, agents can initiate loops of communication — Agent A asks Agent B, which asks Agent C, which asks Agent A — that consume resources without making progress. Implement message routing guards that detect and break circular communication.

Designing multi-agent systems requires anticipating these failure modes and building mitigation into the orchestration architecture before the system is deployed. Multi-agent systems that encounter these failures in production without pre-designed mitigations are much harder to stabilize than single-agent systems.