Multi-Agent Patterns Overview
Multi-agent systems let you decompose complex tasks across specialized agents, but every pattern adds coordination overhead. Choose the simplest topology that handles your task -- most problems do not need more than two or three agents.
| Pattern | Agents | Communication | Best For |
|---|---|---|---|
| Single agent | 1 | N/A | Simple tasks with few tools |
| Sequential pipeline | 2-5 | Output -> Input | Multi-step processing |
| Parallel fan-out | 2-10 | Same input, merge outputs | Independent subtasks |
| Router/dispatcher | 1 + N specialists | Router selects specialist | Domain-specific handling |
| Supervisor-worker | 1 + N workers | Supervisor delegates and reviews | Complex task decomposition |
| Hierarchical | N layers | Multi-level delegation | Enterprise workflows |
| Debate/consensus | 2-5 peers | Argue until agreement | High-stakes decisions |
| Swarm | N peers | Dynamic handoffs | Flexible, exploratory tasks |
Sequential Pipeline
The sequential pipeline is the most intuitive multi-agent pattern -- each agent completes its stage before handing off to the next. Use it when your task has clear, ordered phases that require different expertise.
User Query
│
▼
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Planner │───>│Researcher│───>│ Writer │───>│ Reviewer │
│ Agent │ │ Agent │ │ Agent │ │ Agent │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
│
▼
Final Output
When to Use
- Tasks have clear sequential stages
- Each stage needs different skills/tools
- Output of one stage is input to the next
Implementation
async def sequential_pipeline(query: str):
# Stage 1: Plan
plan = await planner.run(f"Create a research plan for: {query}")
# Stage 2: Research
research = await researcher.run(
f"Execute this research plan:\n{plan}\n\nGather relevant information."
)
# Stage 3: Write
draft = await writer.run(
f"Write a report based on this research:\n{research}"
)
# Stage 4: Review
final = await reviewer.run(
f"Review and improve this draft:\n{draft}\n\n"
f"Original query: {query}"
)
return final
Parallel Fan-Out
Fan-out runs multiple agents simultaneously on the same input, then merges their results. This is ideal when you need multiple independent perspectives or can decompose a task into non-overlapping subtasks.
User Query
/ | \
▼ ▼ ▼
┌──────┐┌──────┐┌──────┐
│Agent │││Agent │││Agent │
│ A │││ B │││ C │
└──┬───┘└──┬───┘└──┬───┘
\ | /
▼ ▼ ▼
┌──────────────┐
│ Aggregator │
└──────────────┘
Implementation
import asyncio
async def fan_out(query: str, agents: list[Agent]) -> str:
# Run all agents in parallel
tasks = [agent.run(query) for agent in agents]
results = await asyncio.gather(*tasks, return_exceptions=True)
# Filter out failures
successes = [r for r in results if not isinstance(r, Exception)]
# Aggregate results
combined = "\n\n---\n\n".join(successes)
final = await aggregator.run(
f"Synthesize these perspectives into a single answer:\n{combined}"
)
return final
Supervisor-Worker
The supervisor-worker pattern gives one agent the authority to plan, delegate, and review -- making it the most powerful pattern for complex, multi-step tasks that require coordination and quality control.
┌──────────────┐
│ Supervisor │
│ Agent │
└──────┬───────┘
│ delegates + reviews
┌────────┼────────┐
▼ ▼ ▼
┌────────┐┌────────┐┌────────┐
│Worker A││Worker B││Worker C│
│(Search)││(Code) ││(Write) │
└────────┘└────────┘└────────┘
Supervisor Prompt
You are a project supervisor managing a team of specialist workers:
- researcher: Searches documents, web, and databases for information
- coder: Writes and debugs code
- writer: Creates and edits text content
- analyst: Performs data analysis and creates charts
For the given task:
1. Break it into subtasks
2. Assign each subtask to the best worker
3. Review each worker's output
4. Request revisions if quality is insufficient
5. Combine results into the final deliverable
Respond with a JSON plan:
{
"subtasks": [
{"worker": "researcher", "task": "...", "depends_on": []},
{"worker": "coder", "task": "...", "depends_on": [0]}
]
}
Router/Dispatcher Pattern
A router classifies incoming requests and sends each one to the most appropriate specialist agent. This pattern is essential for customer-facing applications where queries span multiple domains.
SPECIALISTS = {
"billing": Agent(system="You are a billing specialist...", tools=[...]),
"technical": Agent(system="You are a technical support engineer...", tools=[...]),
"sales": Agent(system="You are a sales representative...", tools=[...]),
"general": Agent(system="You are a helpful assistant...", tools=[...]),
}
async def route_request(query: str):
# Lightweight classification
classification = await classifier.run(
f"Classify this request into one of: billing, technical, sales, general\n"
f"Request: {query}\n"
f"Category:"
)
category = classification.strip().lower()
agent = SPECIALISTS.get(category, SPECIALISTS["general"])
return await agent.run(query)
Debate/Consensus Pattern
The debate pattern forces agents to challenge each other's reasoning, producing more robust answers for high-stakes decisions. A judge agent synthesizes the strongest arguments from both sides into a final answer.
Round 1: Agent A argues position -> Agent B critiques
Round 2: Agent B argues position -> Agent A critiques
Round 3: Judge agent synthesizes best answer
Implementation
async def debate(question: str, rounds: int = 2):
agent_a_history = []
agent_b_history = []
for round_num in range(rounds):
# Agent A argues
a_input = f"Question: {question}\n"
if agent_b_history:
a_input += f"Opponent's last argument:\n{agent_b_history[-1]}\n"
a_input += "Present your argument:"
a_response = await agent_a.run(a_input)
agent_a_history.append(a_response)
# Agent B argues
b_input = f"Question: {question}\n"
b_input += f"Opponent's argument:\n{a_response}\n"
b_input += "Present your counter-argument:"
b_response = await agent_b.run(b_input)
agent_b_history.append(b_response)
# Judge synthesizes
judge_input = (
f"Question: {question}\n\n"
f"Arguments from side A:\n{chr(10).join(agent_a_history)}\n\n"
f"Arguments from side B:\n{chr(10).join(agent_b_history)}\n\n"
f"Synthesize the best answer, incorporating the strongest points from both sides."
)
return await judge.run(judge_input)
Shared Memory
Without shared memory, each agent operates in isolation and cannot benefit from what other agents have already discovered. The right shared state architecture is what turns a collection of agents into a cohesive team.
Memory Architecture
| Memory Type | Scope | Persistence | Use Case |
|---|---|---|---|
| Task context | Single task | In-memory | Passing data between agents |
| Shared scratchpad | All agents in workflow | Session-scoped | Collaborative work |
| Long-term memory | Cross-session | Database | Learning, preferences |
| Knowledge graph | Cross-session | Database | Structured facts |
Shared State Implementation
from dataclasses import dataclass, field
@dataclass
class SharedState:
"""Shared memory accessible by all agents in the workflow."""
task: str = ""
plan: list[str] = field(default_factory=list)
findings: dict[str, str] = field(default_factory=dict)
artifacts: dict[str, str] = field(default_factory=dict)
messages: list[dict] = field(default_factory=list)
status: str = "in_progress"
def add_finding(self, agent: str, key: str, value: str):
self.findings[key] = value
self.messages.append({
"agent": agent, "action": "finding",
"key": key, "summary": value[:200]
})
def get_context_for_agent(self, agent_name: str) -> str:
"""Generate a context summary relevant to this agent."""
recent = self.messages[-10:] # Last 10 messages
return (
f"Task: {self.task}\n"
f"Plan: {self.plan}\n"
f"Recent activity:\n" +
"\n".join(f"- [{m['agent']}] {m['action']}: {m['summary']}"
for m in recent)
)
Quality Gates
Without quality gates, a bad output from one agent cascades through the entire pipeline and corrupts the final result. Gates at each stage catch errors early when they are cheapest to fix.
| Gate | When | Check | Action on Fail |
|---|---|---|---|
| Input validation | Before processing | Format, length, scope | Reject with message |
| Plan review | After planning | Completeness, feasibility | Re-plan |
| Subtask output | After each worker | Quality score, relevance | Retry or reassign |
| Aggregation check | After combining | Consistency, completeness | Revise |
| Final review | Before delivery | All criteria | Revise or escalate |
Quality Gate Implementation
async def quality_gate(output: str, criteria: str, threshold: float = 0.7):
"""Evaluate output quality. Returns (passed, score, feedback)."""
evaluation = await evaluator.run(
f"Rate this output on a scale of 0-1 for: {criteria}\n\n"
f"Output:\n{output}\n\n"
f"Respond as JSON: {{\"score\": 0.X, \"feedback\": \"...\"}}"
)
result = json.loads(evaluation)
passed = result["score"] >= threshold
return passed, result["score"], result["feedback"]
# Usage in pipeline
draft = await writer.run(task)
passed, score, feedback = await quality_gate(draft, "accuracy and completeness")
if not passed:
draft = await writer.run(f"Revise based on feedback: {feedback}\n\nOriginal:\n{draft}")
Human-in-the-Loop
Fully autonomous multi-agent systems will eventually encounter tasks they cannot handle or decisions they should not make alone. Designing explicit human touchpoints prevents agents from taking irreversible bad actions.
Interaction Points
| Point | Trigger | Human Action |
|---|---|---|
| Approval gate | Before destructive action | Approve / reject / modify |
| Escalation | Agent confidence below threshold | Take over or guide |
| Review | After draft/plan generation | Edit, approve, or request revision |
| Disambiguation | Ambiguous user request | Clarify intent |
| Exception handling | Agent encounters unknown scenario | Provide instructions |
Escalation Pattern
async def agent_with_escalation(query: str, confidence_threshold: float = 0.6):
response = await agent.run(query)
# Self-assessed confidence
confidence = await evaluator.run(
f"Rate your confidence in this response (0-1):\n{response}"
)
if float(confidence) < confidence_threshold:
# Escalate to human
human_input = await request_human_review(
query=query,
agent_response=response,
confidence=float(confidence),
reason="Low confidence - please review or provide guidance"
)
if human_input.action == "approve":
return response
elif human_input.action == "override":
return human_input.response
elif human_input.action == "guide":
return await agent.run(
f"Original query: {query}\n"
f"Human guidance: {human_input.guidance}\n"
f"Please revise your response."
)
return response
Framework Comparison for Multi-Agent
Not every framework supports every pattern. Choose based on the specific patterns you need, your language preference, and whether built-in state management and human-in-the-loop support matter to you.
| Framework | Pattern Support | State Management | Human-in-Loop |
|---|---|---|---|
| LangGraph | All (graph-based) | Built-in checkpointing | Yes |
| CrewAI | Sequential, Hierarchical | Shared memory | Yes |
| AutoGen | Conversation, Debate | Chat history | Yes |
| Agents SDK (OpenAI) | Handoffs, Router | Thread-based | Via tool |
| Mastra | Workflow-based | Built-in | Yes |
When to Use Multi-Agent
The biggest mistake in multi-agent design is reaching for it when a single agent would suffice. Multi-agent systems add latency, cost, and debugging complexity -- only use them when the task genuinely demands it.
Use Multi-Agent When
- Task requires genuinely different expertise areas
- Subtasks can be parallelized for speed
- You need checks and balances (reviewer, critic)
- Single agent context window is insufficient
- Different stages need different tools or models
Do NOT Use Multi-Agent When
- A single well-prompted agent can handle the task
- The overhead of coordination exceeds the benefit
- Tasks are tightly coupled and hard to decompose
- Latency is critical (agent-to-agent adds latency)
- Debugging simplicity is more important than capability
Common Pitfalls
Multi-agent systems multiply the failure modes of single agents by the number of agents and their interactions. These pitfalls are responsible for most multi-agent project failures.
| Pitfall | Problem | Fix |
|---|---|---|
| Over-engineering | Simple task + 5 agents = slow and expensive | Start with single agent, add more only when needed |
| No quality gates | Bad output from one agent cascades | Add review steps between agents |
| Shared state conflicts | Agents overwrite each other | Use structured state with atomic updates |
| No max iteration limit | Infinite revision loops | Set hard limits on retries (2-3) |
| Missing context | Agent lacks info from prior stages | Pass relevant shared state, not just last output |
| Ignoring cost | Multi-agent multiplies LLM calls | Track total cost, use cheaper models for simple tasks |
| No observability | Cannot debug agent interactions | Log every agent call, state transition |
| No fallback | Multi-agent system fails completely | Graceful degradation to simpler agent or human |