Module 09 of 17

Agents & Multi-Agent Systems

A practitioner's guide to LLM agents — ReAct patterns, tool definition best practices, when to use agents vs. chains, memory strategies, multi-agent architectures, human-in-the-loop gates, and production guardrails. Patterns that ship, not theory.

11 min readOpen in Colab

01. The ReAct Pattern

Diagram 1

Figure 1 — The ReAct loop: Reason, Act, Observe, repeating until the task is complete

The ReAct pattern (Reasoning + Acting) is the foundational framework for LLM agents. The model alternates between reasoning (thinking through the problem and planning the next step) and acting (executing a tool). After each action, the agent observes the result and uses that observation to inform its next reasoning step. This loop repeats until the task is solved.

Consider a user asking: "What was the weather in the city where Apple was founded, on the day the iPhone was announced?" A standard LLM guesses from memory and likely gets details wrong. A ReAct agent thinks: "I need to know where Apple was founded." It calls a search tool and learns "Cupertino." Then it searches for the iPhone announcement date, finds "January 9, 2007." Finally, it calls a weather API for Cupertino on that date. Each step builds on the previous result.

What makes ReAct powerful is its flexibility. Unlike traditional programming where you specify every step in advance, a ReAct agent figures out steps at runtime based on what it discovers. If the first search returns ambiguous results, it can rephrase and search again. If an API fails, it can try an alternative. This adaptive behavior makes agents robust for real-world tasks where the path to the answer is not known in advance.

Think of it like this: A ReAct agent works like a detective solving a case. They do not just think and announce the solution. They think ("the suspect was in Paris -- let me check flight records"), take an action (look up flights), observe the result ("no flight to Paris"), update their thinking ("then they could not have been there -- who else had motive?"), and repeat until the case is solved.

What This Means for Practitioners

When to use agents vs. simpler approaches:

PatternUse WhenLatencyCostComplexity
Single LLM callTask is straightforward, no external data neededLowestLowestNone
Chain (linear pipeline)Steps are known in advance, no branchingLowLowLow
ReAct agentPath to answer is unknown, requires dynamic tool use3-10x single call5-10x single callMedium
Multi-agent systemTask requires multiple specialties collaborating10-30x single call10-50x single callHigh

Build an agent only when you need dynamic decision-making. If your task is a linear pipeline (classify, retrieve, generate), use a chain. Agents are justified when the task requires a variable number of tool calls, iterative refinement, or runtime planning.

Minimal ReAct agent from scratch:

from openai import OpenAI
import json

client = OpenAI()

tools = [
    {"type": "function", "function": {
        "name": "web_search",
        "description": "Search the web for current information",
        "parameters": {"type": "object",
            "properties": {"query": {"type": "string"}},
            "required": ["query"]}
    }},
    {"type": "function", "function": {
        "name": "calculator",
        "description": "Evaluate a mathematical expression",
        "parameters": {"type": "object",
            "properties": {"expression": {"type": "string"}},
            "required": ["expression"]}
    }}
]

def run_agent(user_message: str, max_steps: int = 10) -> str:
    messages = [
        {"role": "system", "content": "You are a helpful assistant with tools. "
                                      "Think step by step. Use tools when needed."},
        {"role": "user", "content": user_message}
    ]

    for step in range(max_steps):
        response = client.chat.completions.create(
            model="gpt-4o", messages=messages,
            tools=tools, tool_choice="auto"
        )
        msg = response.choices[0].message
        messages.append(msg)

        if not msg.tool_calls:
            return msg.content

        for tool_call in msg.tool_calls:
            args = json.loads(tool_call.function.arguments)
            result = execute_tool(tool_call.function.name, args)
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": result
            })

    return "Max steps reached without completing the task."

Always cap iterations. Without a maximum, a model that repeatedly fails a tool call will spin indefinitely and burn tokens. 8-12 iterations is reasonable for most tasks.

02. Tool Definition Best Practices

Tools are the arms and legs of an LLM agent. The mechanism that enables them is function calling: the LLM generates structured JSON specifying which function to call and with what arguments. Your code executes the function and feeds the result back. Both OpenAI and Anthropic support this natively.

The quality of tool descriptions is the single biggest factor in agent reliability. The LLM decides which tool to use and constructs arguments based entirely on your description and schema.

What This Means for Practitioners

Tool description checklist:

ElementBad ExampleGood Example
What it does"Does stuff with data""Queries a PostgreSQL database with a SQL SELECT statement"
When to use it(missing)"Use when the user asks about customer data, orders, or inventory"
Parameter constraints(missing)"Only SELECT queries allowed. Max 100 rows returned."
Return format(missing)"Returns results as JSON array of objects"

Keep tool count between 3 and 10. Too few tools limit capability; too many confuse the model about which to choose. For many capabilities, organize tools into categories and have the agent first select a category, then a specific tool.

Anthropic tool calling example:

import anthropic

client = anthropic.Anthropic()

tools = [
    {
        "name": "query_database",
        "description": "Execute a read-only SQL query against the application database. "
                       "Returns results as JSON array. Only SELECT queries allowed. "
                       "Use for: customer lookups, order history, product searches.",
        "input_schema": {
            "type": "object",
            "properties": {
                "sql": {"type": "string", "description": "SQL SELECT query"}
            },
            "required": ["sql"]
        }
    },
    {
        "name": "send_email",
        "description": "Send an email. Requires explicit user confirmation before sending.",
        "input_schema": {
            "type": "object",
            "properties": {
                "to": {"type": "string"},
                "subject": {"type": "string"},
                "body": {"type": "string"}
            },
            "required": ["to", "subject", "body"]
        }
    }
]

def run_claude_agent(task: str, max_steps: int = 10) -> str:
    messages = [{"role": "user", "content": task}]
    for _ in range(max_steps):
        response = client.messages.create(
            model="claude-sonnet-4-6", max_tokens=4096,
            system="You are a helpful assistant. Use tools when needed.",
            tools=tools, messages=messages
        )
        if response.stop_reason == "tool_use":
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    result = execute_tool(block.name, block.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result
                    })
            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})
        else:
            return "".join(b.text for b in response.content if hasattr(b, "text"))
    return "Max steps reached."

03. LangGraph Workflows

While the basic ReAct loop works for straightforward tasks, complex workflows need more structure. Consider a customer support agent that must: identify the customer, look up order history, determine issue type, route to the right resolution workflow (refund, replacement, support), and execute with approval. This is not a simple loop -- it is a directed graph with conditional branches, parallel paths, and approval gates.

LangGraph builds these workflows as state machines. You define nodes (functions), edges (connections), and conditional edges (branches based on state). Its key advantage over simple chains is cycles -- the ability to loop back to earlier nodes for retry or refinement.

Think of it like this: A simple chain is an assembly line -- parts go in one end and come out the other. LangGraph is a flowchart that actually executes -- each box is a function, arrows include conditions, and some arrows loop back to earlier boxes.

What This Means for Practitioners

When to use LangGraph vs. simpler options:

RequirementSimple chainReAct agentLangGraph
Linear sequence of stepsYesOverkillOverkill
Dynamic tool selectionNoYesYes
Conditional branchingNoLimitedYes
Retry/refinement loopsNoImplicitExplicit, controllable
Human approval gatesNoNoYes (interrupt_before)
State persistence across restartsNoNoYes (checkpointing)
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
from typing import TypedDict, Annotated
from operator import add

class AgentState(TypedDict):
    task: str
    plan: list[str]
    results: Annotated[list[str], add]
    current_step: int
    final_answer: str

def planner(state: AgentState) -> AgentState:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user",
            "content": f"Break this into 2-4 research steps as JSON array: {state['task']}"}],
        response_format={"type": "json_object"}
    )
    steps = json.loads(response.choices[0].message.content)["steps"]
    return {"plan": steps, "current_step": 0}

def researcher(state: AgentState) -> AgentState:
    step = state["plan"][state["current_step"]]
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Research: {step}"}]
    )
    return {"results": [response.choices[0].message.content],
            "current_step": state["current_step"] + 1}

def should_continue(state: AgentState) -> str:
    if state["current_step"] < len(state["plan"]):
        return "research"
    return "synthesize"

def synthesizer(state: AgentState) -> AgentState:
    all_results = "\n\n".join(state["results"])
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user",
            "content": f"Synthesize findings for: {state['task']}\n\n{all_results}"}]
    )
    return {"final_answer": response.choices[0].message.content}

graph = StateGraph(AgentState)
graph.add_node("planner", planner)
graph.add_node("researcher", researcher)
graph.add_node("synthesizer", synthesizer)
graph.set_entry_point("planner")
graph.add_edge("planner", "researcher")
graph.add_conditional_edges("researcher", should_continue,
    {"research": "researcher", "synthesize": "synthesizer"})
graph.add_edge("synthesizer", END)

app = graph.compile(checkpointer=MemorySaver())

04. Multi-Agent Systems

Diagram 2

Figure 2 — Supervisor multi-agent pattern: manager routes tasks to specialist agents

A single agent with many tools becomes unreliable as complexity grows. Multi-agent systems split responsibilities across specialized agents that each excel at one thing. Instead of one person doing sales, engineering, and accounting, you have specialists who collaborate.

What This Means for Practitioners

Multi-agent pattern comparison:

PatternHow It WorksBest ForTradeoff
SupervisorManager agent delegates to specialistsMost use cases (recommended start)Single point of control, easier to debug
Sequential (pipeline)Each agent's output feeds the nextContent creation, report generationClear stages, but no backtracking
Collaborative (debate)Agents discuss and refine togetherCode gen (coder + tester), analysisHigher quality but slower, more tokens

The supervisor pattern is the most practical starting point. A manager receives the request, breaks it down, and delegates to specialist agents (researcher, coder, writer, reviewer). Each specialist has its own tools and system prompt optimized for its domain.

Key design decision: granularity. Too many agents add coordination overhead and latency. Too few lose the specialization benefit. Start with 2-3 agents and add more only when a single agent is measurably struggling with its scope.

Memory strategies for multi-turn agents:

StrategyHow It WorksBest For
Full historyPass entire conversationShort conversations (<20 turns)
Sliding windowKeep last N messagesLong conversations, cost control
Summary memoryLLM summarizes older historyVery long sessions
Shared state (LangGraph)Structured state object passed between nodesMulti-agent collaboration

05. Human-in-the-Loop

Fully autonomous agents are powerful but risky. An agent that can send emails, modify databases, or deploy code could cause significant damage from a wrong decision. Human-in-the-loop (HITL) patterns add checkpoints where the agent pauses for human approval before taking high-impact actions.

What This Means for Practitioners

Approval gate decision guide:

Action TypeReversible?Blast RadiusGate Needed?
Reading data, search queriesN/ANoneNo gate
Drafting content (not sending)YesNoneNo gate or light review
Sending emails/messagesNoMediumYes -- always gate
Modifying production dataNoHighYes -- hard stop for review
Financial transactionsNoHighYes -- mandatory approval
Deploying codePartiallyHighYes -- mandatory approval

LangGraph supports HITL natively through its interrupt mechanism. The graph saves state, pauses, and waits for external approval:

from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
from typing import TypedDict

class ApprovalState(TypedDict):
    task: str
    proposed_action: str
    approved: bool
    result: str

def plan_action(state: ApprovalState) -> ApprovalState:
    return {"proposed_action": f"Send email to customer about: {state['task']}"}

def execute_action(state: ApprovalState) -> ApprovalState:
    if state["approved"]:
        return {"result": f"Executed: {state['proposed_action']}"}
    return {"result": "Action was rejected by human reviewer."}

graph = StateGraph(ApprovalState)
graph.add_node("plan", plan_action)
graph.add_node("execute", execute_action)
graph.set_entry_point("plan")
graph.add_edge("plan", "execute")
graph.add_edge("execute", END)

app = graph.compile(
    checkpointer=MemorySaver(),
    interrupt_before=["execute"]  # Pause here for approval
)

# Run until interrupt
config = {"configurable": {"thread_id": "approval-1"}}
state = app.invoke(
    {"task": "refund order #12345", "approved": False,
     "proposed_action": "", "result": ""},
    config=config
)

# Human reviews and approves...
app.update_state(config, {"approved": True})
final = app.invoke(None, config=config)  # Resume from checkpoint

06. Production Guardrails

Agents introduce safety challenges that do not exist with simple LLM calls. A chatbot that generates incorrect text is annoying; an agent that executes incorrect actions can delete data, send embarrassing emails, or burn through your cloud budget.

What This Means for Practitioners

Production safety layer checklist:

Safety LayerWhat It PreventsImplementation
Max iterationsInfinite loopsCounter in agent loop (8-12 default)
Token budgetRunaway costsToken counter + hard limit per request
Tool timeoutsHung tool callsasyncio.timeout or threading timeout
Input validationInjection attacks, malformed argsPydantic schemas per tool
Output filteringPII leaks, policy violationsRegex + LLM guardrail check
Audit loggingUntrackable actionsStructured logs per step with trace IDs
Human approvalIrreversible actionsLangGraph interrupt_before
Least privilegeDamage from compromised agentsScoped API keys, read-only by default

Cost tracking wrapper:

class AgentBudget:
    def __init__(self, max_llm_calls: int = 20, max_tokens: int = 50_000):
        self.max_llm_calls = max_llm_calls
        self.max_tokens = max_tokens
        self.llm_calls = 0
        self.total_tokens = 0

    def track(self, tokens_used: int):
        self.llm_calls += 1
        self.total_tokens += tokens_used
        if self.llm_calls > self.max_llm_calls:
            raise RuntimeError(f"Agent exceeded max LLM calls: {self.max_llm_calls}")
        if self.total_tokens > self.max_tokens:
            raise RuntimeError(f"Agent exceeded token budget: {self.max_tokens}")

Observability is non-negotiable. Every tool call, every reasoning step, every decision branch must be logged with timestamps, input/output data, and latency. When an agent produces a wrong answer or takes an incorrect action, you need to trace back through the entire execution. Tools like LangSmith, Phoenix (Arize), and custom structured logging provide this visibility.

Interview Ready

How to Explain This in 2 Minutes

Elevator Pitch: Agents are the leap from LLMs that answer questions to LLMs that take actions. A basic LLM generates one response and stops, but an agent follows the ReAct loop -- Reason about the task, Act by calling a tool, Observe the result, and repeat until the goal is met. Tool use is implemented through function calling: you declare tool schemas, the model returns structured JSON calls, your code executes them and feeds results back. For complex workflows, LangGraph models the agent as a stateful graph with conditional branching and cycles. Multi-agent architectures let specialized agents collaborate through a supervisor pattern. Human-in-the-loop checkpoints ensure high-stakes actions require approval. Safety comes from iteration budgets, tool sandboxing, and structured audit logging.

Likely Interview Questions

QuestionWhat They're Really Asking
Walk me through the ReAct pattern.Do you understand the Reason-Act-Observe loop and why iterative tool use beats one-pass generation?
How does function calling work at the API level?Can you explain the mechanics -- schemas, tool call objects, execution, result feeding?
When would you use LangGraph over a simple chain?Do you know when conditional branching, cycles, or approval gates are needed?
What safety mechanisms would you implement for an action-taking agent?Do you understand sandboxing, HITL, budgets, logging, and least privilege?

Common Mistakes

  • Building an agent when a simple chain would suffice. Agents add 3-5x latency and 5-10x cost. Use them only for dynamic decision-making and variable tool use.
  • Not setting iteration and token budgets. Without limits, agents can enter infinite loops. Always set max iterations, max tokens, and wall-clock timeouts.
  • Giving agents unrestricted tool access. Every irreversible action should require human approval in production. All tool executions should be logged for audit and rollback.

Previous

08 · Advanced RAG

Next

10 · Evaluation