GenAI Architecture 09

Multi-Agent Systems

Orchestrate multiple specialized AI agents that collaborate to solve complex, multi-step problems. Each agent has a defined role, tool access, and communication protocol — coordinated by an orchestrator that plans, delegates, synthesizes results, and handles failures gracefully.

8 min readAgent SpecializationCommunication PatternsHandoff ProtocolsCost ControlOpen in Colab

1. Architecture Overview

The Multi-Agent System architecture deploys multiple specialized AI agents, each with a focused role, distinct tool access, and dedicated system prompts. An orchestrator agent plans the overall task, delegates sub-tasks to specialist agents, collects their results, and synthesizes a final response. This pattern excels when no single agent can efficiently handle the full breadth of a complex workflow.

When to Use

  • Tasks require multiple distinct capabilities (research, coding, analysis, review)
  • A single agent's context window cannot hold all necessary information simultaneously
  • Different sub-tasks benefit from different models, prompts, or temperature settings
  • You need separation of concerns for security (e.g., only one agent accesses production databases)
  • Complex workflows need iterative refinement with feedback loops between specialists

Complexity Level

High. Multi-agent systems multiply the complexity of single-agent architectures. Only use this pattern when you have proven that a single agent with tools cannot meet your quality or capability requirements. Start with 2 agents before scaling to more.

Tip: The most common mistake is creating too many agents. Each additional agent adds latency, cost, and failure modes. Start with the minimum viable agent count and add specialists only when you have clear evidence of improvement.

2. Architecture Diagram

Diagram 1

Architecture diagram — Multi-Agent: orchestrator fans out to specialized agents with tools, collects results, and synthesizes a unified response

3. Components Deep Dive

ComponentDescription
🎯 Orchestrator AgentThe central coordinator that decomposes tasks, assigns work to specialists, tracks progress, handles failures, and merges results. Uses a planner prompt that understands all available agents and their capabilities.
🔍 Researcher AgentSpecializes in information gathering using search tools, document retrieval, and knowledge bases. Returns structured findings with source citations. Optimized for breadth and accuracy over speed.
💻 Coder AgentWrites, debugs, and executes code in a sandboxed environment. Has access to a code executor (REPL, Docker container) and can install packages, run tests, and iterate on errors autonomously.
🔎 Reviewer AgentReviews outputs from other agents for correctness, quality, and safety. Runs linting, type checking, and test suites. Can request revisions from the coder agent before approving.
💬 Message BusStructured communication layer between agents. Messages include sender, recipient, action type (request, response, error), payload, and metadata. Enables logging and replay for debugging.
🛡 Safety ControlsMax iteration limits prevent infinite loops. Token budgets cap total cost. Human-in-the-loop escalation triggers when confidence is low or stakes are high. Circuit breakers halt on repeated failures.

Communication Patterns Comparison

PatternHow It WorksProsConsBest For
HierarchicalOrchestrator delegates to agents; agents only talk to orchestratorClear control flow, easy to debugBottleneck at orchestrator, more round tripsMost production systems
Peer-to-PeerAgents communicate directly with each otherLower latency, parallel workHard to track state, potential loopsTightly coupled tasks (code + review)
BlackboardShared workspace; agents read/write to common stateFlexible, decoupled agentsRace conditions, complex state managementCollaborative document editing

Framework Comparison

FrameworkArchitectureStrengthsConsiderations
LangGraphState machine / graph-basedExplicit control flow, persistence, human-in-the-loopSteeper learning curve, LangChain ecosystem
CrewAIRole-based crewSimple API, role/goal/backstory abstractionsLess control over execution order
AutogenConversational agentsNatural multi-turn dialogue, code execution built inCan be chatty, harder to constrain

4. Implementation

Three-Agent System with Orchestrator

import anthropic
import json
from dataclasses import dataclass, field
from typing import Optional

client = anthropic.Anthropic()

# ---------- Agent definitions ----------

@dataclass
class Agent:
    name: str
    role: str
    system_prompt: str
    tools: list = field(default_factory=list)
    model: str = "claude-sonnet-4-20250514"
    temperature: float = 0.3
    max_tokens: int = 2048

# Define specialist agents
researcher = Agent(
    name="researcher",
    role="Information gathering and analysis",
    system_prompt="""You are a research specialist. Your job is to:
1. Search for relevant information on the given topic
2. Summarize findings with source citations
3. Identify key facts, statistics, and expert opinions
Return structured JSON with keys: findings, sources, confidence.""",
    tools=[{
        "name": "web_search",
        "description": "Search the web for information",
        "input_schema": {
            "type": "object",
            "properties": {"query": {"type": "string"}},
            "required": ["query"]
        }
    }]
)

coder = Agent(
    name="coder",
    role="Code writing and execution",
    system_prompt="""You are a coding specialist. Your job is to:
1. Write clean, well-documented code for the given task
2. Execute code and verify it works correctly
3. Fix any errors and iterate until tests pass
Return the final working code with explanation.""",
    tools=[{
        "name": "execute_code",
        "description": "Execute Python code in a sandbox",
        "input_schema": {
            "type": "object",
            "properties": {"code": {"type": "string"}},
            "required": ["code"]
        }
    }]
)

reviewer = Agent(
    name="reviewer",
    role="Quality assurance and review",
    system_prompt="""You are a code reviewer. Your job is to:
1. Review code for correctness, security, and best practices
2. Run linting and type checking
3. Identify bugs, edge cases, and improvements
Return JSON: {approved: bool, issues: [...], suggestions: [...]}""",
    tools=[{
        "name": "run_tests",
        "description": "Run pytest, ruff, and mypy on code",
        "input_schema": {
            "type": "object",
            "properties": {"code": {"type": "string"}},
            "required": ["code"]
        }
    }]
)

AGENTS = {"researcher": researcher, "coder": coder, "reviewer": reviewer}

Orchestrator with Function Calling

# ---------- Orchestrator ----------

ORCHESTRATOR_SYSTEM = """You are an orchestrator managing a team of specialist agents.

Available agents:
- researcher: Searches and analyzes information
- coder: Writes and executes code
- reviewer: Reviews code quality and runs tests

For each user request:
1. Break the task into sub-tasks
2. Use the delegate_to_agent tool to assign work
3. Collect results and iterate if needed
4. Synthesize a final response

Rules:
- Max 10 total delegations per request
- Always have the reviewer check code before finalizing
- If an agent fails twice, escalate to the user"""

orchestrator_tools = [{
    "name": "delegate_to_agent",
    "description": "Delegate a sub-task to a specialist agent",
    "input_schema": {
        "type": "object",
        "properties": {
            "agent_name": {
                "type": "string",
                "enum": ["researcher", "coder", "reviewer"]
            },
            "task": {"type": "string"},
            "context": {"type": "string", "default": ""}
        },
        "required": ["agent_name", "task"]
    }
}]

def call_agent(agent: Agent, task: str, context: str = "") -> str:
    """Call a specialist agent with a task."""
    message = f"Task: {task}"
    if context:
        message += f"\n\nContext from other agents:\n{context}"

    response = client.messages.create(
        model=agent.model,
        max_tokens=agent.max_tokens,
        system=agent.system_prompt,
        messages=[{"role": "user", "content": message}],
        temperature=agent.temperature,
    )
    return response.content[0].text

def run_multi_agent(user_request: str, max_delegations: int = 10) -> str:
    """Run the multi-agent orchestration loop."""
    messages = [{"role": "user", "content": user_request}]
    delegation_count = 0
    total_tokens = 0

    while delegation_count < max_delegations:
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            system=ORCHESTRATOR_SYSTEM,
            messages=messages,
            tools=orchestrator_tools,
        )
        total_tokens += response.usage.input_tokens + response.usage.output_tokens

        # Check if orchestrator wants to delegate
        if response.stop_reason == "tool_use":
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    agent = AGENTS[block.input["agent_name"]]
                    result = call_agent(
                        agent,
                        block.input["task"],
                        block.input.get("context", ""),
                    )
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result,
                    })
                    delegation_count += 1
                    print(f"  [{delegation_count}] {agent.name}: {block.input['task'][:80]}")

            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})
        else:
            # Orchestrator has synthesized the final answer
            print(f"\nCompleted in {delegation_count} delegations, ~{total_tokens} tokens")
            return response.content[0].text

    return "Max delegations reached. Partial results returned."

# Usage
result = run_multi_agent(
    "Research best practices for rate limiting APIs, then write a Python "
    "implementation with tests, and have it reviewed for production readiness."
)

5. Data Flow

Here is the step-by-step flow through the Multi-Agent orchestration:

Data Flow

StepActionDetails
1User submits requestA complex task that requires multiple capabilities (research, coding, review)
2Orchestrator plansBreaks the task into sub-tasks, identifies which agents to involve, determines ordering
3Fan-out delegationOrchestrator calls delegate_to_agent for each sub-task; independent tasks can run in parallel
4Agents executeEach specialist processes its sub-task using its tools (search, code execution, testing)
5Results returnedAgent outputs flow back to the orchestrator as tool results
6Iterate if neededOrchestrator may request revisions, send coder output to reviewer, or ask researcher for more data
7Synthesize responseOrchestrator merges all agent outputs into a coherent final answer
8Return to userFinal response includes contributions from all agents, with provenance tracking

State Management: Shared context passes relevant results from one agent to another via the orchestrator. Isolated state keeps each agent's working memory separate, preventing context pollution. Most production systems use a hybrid: shared task context with isolated agent scratchpads.

6. Trade-offs & Considerations

AdvantageLimitation
Handles complex, multi-step tasks beyond single agent capabilitySignificantly higher latency (multiple LLM calls in sequence)
Separation of concerns: each agent is focused and testableCost multiplied by number of agents and iterations
Different models/configs per agent (cost vs quality optimization)Harder to debug: failures can cascade across agents
Built-in quality control via reviewer agentRisk of infinite loops or excessive iterations
Naturally supports human-in-the-loop at any stageComplex state management and error recovery

Cost Control Strategies

  • Token budgets: Set per-agent and per-request token limits; terminate early if exceeded
  • Iteration caps: Hard limit on total delegations (typically 5-15 per request)
  • Model tiering: Use cheaper models for simple agents (researcher: Haiku), expensive models only for the orchestrator
  • Caching: Cache agent results for repeated sub-tasks across requests
  • Early termination: Stop when the orchestrator has sufficient confidence, even if not all agents have been consulted

When to upgrade: When you need to run multi-agent systems at enterprise scale with shared infrastructure, model management, and observability, move to Architecture 10 (Production Platform).

7. Production Checklist

  • Max iteration limits per request (prevent infinite agent loops)
  • Per-agent and per-request token budgets with cost tracking
  • Structured logging of all inter-agent messages for debugging and replay
  • Timeout per agent call with graceful degradation
  • Human-in-the-loop escalation for low-confidence or high-stakes decisions
  • Sandboxed code execution (Docker containers, no network access)
  • Agent-level error recovery: retry with backoff, fallback to simpler approach
  • Circuit breaker: halt entire workflow if error rate exceeds threshold
  • Observability: distributed traces linking all agent calls in a single request
  • A/B test agent configurations (prompts, models, tools) independently
  • Rate limiting on tool calls to prevent abuse of external APIs
  • Replay capability: re-run any request with the same agent state for debugging