AI Agents | LegacyForward.ai | LegacyForward.ai

Agent Anatomy: Observe-Think-Act Loop

Every AI agent, regardless of framework, follows this core loop. Understanding it is essential before you add complexity with multi-agent patterns or custom orchestration.

                    ┌─────────────┐
                    │   OBSERVE   │
                    │ Read input, │
                    │ tool output,│
                    │ environment │
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
                    │    THINK    │
                    │ Reason about│
                    │ next action │
                    │ (LLM call)  │
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
                    │     ACT     │
                    │ Call tool,  │
                    │ respond, or │
                    │ delegate    │
                    └──────┬──────┘
                           │
                    loops until done

Core Agent Components

An agent is more than an LLM with tools -- it needs memory, planning, and reflection to handle real-world tasks reliably. Missing any of these components leads to agents that work in demos but fail in production.

Component	Purpose	Implementation
System prompt	Identity, rules, capabilities	Static text + dynamic context
Tool definitions	Available actions	Function schemas (JSON)
Memory	Conversation context	Buffer, summary, or vector store
Planning	Task decomposition	CoT, ReAct, or explicit planner
Execution	Tool calling + result processing	Function dispatch + error handling
Reflection	Self-check, retry logic	Output validation, critic LLM

Tool Definition Patterns

Tool definitions are the contract between your agent and the outside world. Vague descriptions and sloppy schemas are the #1 cause of agents calling the wrong tool or passing bad arguments.

OpenAI-Compatible Tool Schema

{
  "type": "function",
  "function": {
    "name": "search_database",
    "description": "Search the product database by query. Returns top 5 results.",
    "parameters": {
      "type": "object",
      "properties": {
        "query": {
          "type": "string",
          "description": "Natural language search query"
        },
        "category": {
          "type": "string",
          "enum": ["electronics", "clothing", "books"],
          "description": "Optional category filter"
        },
        "max_results": {
          "type": "integer",
          "default": 5,
          "description": "Maximum number of results to return"
        }
      },
      "required": ["query"]
    }
  }
}

Tool Design Best Practices

Principle	Good	Bad
Specific name	`search_orders_by_email`	`search`
Clear description	"Finds orders by customer email. Returns order ID, date, total."	"Search stuff"
Typed parameters	`{"type": "integer", "minimum": 1}`	`{"type": "string"}` for a number
Required vs optional	Mark only truly required as required	Everything required
Return format	Documented, consistent structure	Unpredictable output
Error surface	Return error in result, don't throw	Silent failure

Memory Types

Without memory, every agent turn starts from scratch. The right memory architecture determines whether your agent can handle a 5-message chat or a 500-step workflow spanning multiple sessions.

Memory Type	How It Works	Capacity	Use Case
Buffer (sliding window)	Keep last N messages	Low-medium	Short conversations
Token buffer	Keep last N tokens of history	Medium	Token-budget aware
Summary	LLM summarizes older messages	High	Long conversations
Vector/semantic	Embed messages, retrieve relevant	Very high	Knowledge-heavy agents
Episodic	Store full episodes, retrieve by similarity	Very high	Learning from past tasks
Entity	Extract and track entity states	Medium	Customer service, CRM
Structured (KG)	Knowledge graph of facts	High	Complex domain reasoning

Memory Selection Guide

How long are typical conversations?
  < 10 turns    -> Buffer memory (simple, cheap)
  10-50 turns   -> Summary memory (compress old context)
  50+ turns     -> Vector memory (retrieve relevant only)

Does the agent need to learn across sessions?
  YES -> Episodic + Vector memory
  NO  -> Buffer or Summary is fine

Does the agent track many entities?
  YES -> Entity memory + structured storage
  NO  -> Standard memory is fine

Agent Frameworks Comparison

Choosing a framework is a build-vs-buy decision that affects your iteration speed and lock-in. Pick based on your language, complexity needs, and whether you need multi-agent support.

Framework	Language	Key Feature	Best For
LangGraph	Python/JS	Graph-based workflows	Complex stateful agents
CrewAI	Python	Role-based multi-agent	Team simulations
AutoGen	Python	Conversational agents	Research, debate patterns
Semantic Kernel	C#/Python	Enterprise integration	.NET ecosystems
Haystack	Python	Pipeline-based	RAG-heavy agents
Agents SDK (OpenAI)	Python	Handoffs, guardrails	OpenAI-centric apps
Claude Agent SDK	Python	MCP tools, model agnostic	Anthropic-centric apps
Mastra	TypeScript	Workflows, evals	TS/JS applications

ReAct Pattern

ReAct (Reason + Act) is the most widely used agent pattern because it forces the model to explain its reasoning before taking action. This makes agent behavior interpretable and debuggable.

Thought: I need to find the user's order status. I'll search by their email.
Action: search_orders(email="user@example.com")
Observation: Found order #1234, status: shipped, tracking: XYZ789
Thought: I have the info. I'll respond with the order status and tracking number.
Answer: Your order #1234 has been shipped. Tracking number: XYZ789.

ReAct Implementation

def react_loop(query, tools, max_steps=10):
    messages = [
        {"role": "system", "content": REACT_SYSTEM_PROMPT},
        {"role": "user", "content": query}
    ]

    for step in range(max_steps):
        response = llm.chat(messages, tools=tools)

        if response.tool_calls:
            for call in response.tool_calls:
                result = execute_tool(call.name, call.args)
                messages.append({"role": "tool", "content": result,
                                 "tool_call_id": call.id})
        else:
            return response.content  # Final answer

    return "Max steps reached without resolution."

Orchestration Patterns

How you wire agents together determines your system's capability ceiling and failure modes. Start with the simplest pattern that works and only add complexity when you have evidence a single agent cannot handle the task.

Pattern	Description	Use Case	Complexity
Single agent	One LLM + tools	Simple tasks	Low
Sequential pipeline	Agent A output feeds Agent B	Multi-step processing	Medium
Parallel fan-out	Same input to N agents	Multiple perspectives	Medium
Router	Classifier routes to specialist	Domain-specific handling	Medium
Supervisor-worker	Supervisor delegates, reviews	Complex task decomposition	High
Hierarchical	Multi-level supervisors	Enterprise workflows	High
Debate/consensus	Agents argue, reach agreement	High-stakes decisions	High

Supervisor-Worker Pattern

# Supervisor decides which worker to call
supervisor_prompt = """
You are a supervisor managing these workers:
- researcher: Finds information from documents
- calculator: Performs mathematical computations
- writer: Drafts text content

Given the user request, decide which worker(s) to call and in what order.
Respond with a plan as JSON: {"steps": [{"worker": "...", "task": "..."}]}
"""

# Workers are specialized agents with focused tool sets
workers = {
    "researcher": Agent(tools=[search, retrieve]),
    "calculator": Agent(tools=[calculate, chart]),
    "writer": Agent(tools=[draft, edit]),
}

Planning Strategies

Planning determines whether your agent tackles a complex task methodically or stumbles through it. The right strategy balances structure against adaptability -- too rigid and the agent can't recover from surprises, too loose and it loses track of its goal.

Strategy	How It Works	Pros	Cons
No explicit plan	LLM decides step by step	Simple, flexible	May lose track
Upfront plan	Generate full plan first, then execute	Organized	Inflexible to new info
Adaptive plan	Plan, execute, re-plan after each step	Flexible, informed	Higher LLM cost
Plan-and-solve	Decompose into sub-tasks, solve each	Good for complex tasks	Overhead

Error Handling

Agents fail in ways that traditional software does not -- they hallucinate tool names, pass invalid arguments, and get stuck in infinite loops. Robust error handling is what separates a demo agent from a production one.

Error Type	Detection	Recovery
Tool not found	Invalid tool name in response	Re-prompt with available tools
Tool execution failure	Exception from tool	Return error to LLM, let it retry
Infinite loop	Step counter exceeds max	Force response or escalate
Hallucinated tool call	Tool name not in schema	Filter, re-prompt
Wrong arguments	Schema validation failure	Return validation error to LLM
Context overflow	Token count exceeded	Summarize history, trim old messages

Agent Evaluation

Agent evaluation goes beyond LLM output quality -- you also need to measure tool accuracy, step efficiency, and cost. Without these metrics, you are flying blind on whether your agent is actually improving.

Metric	What It Measures	How to Measure
Task completion	Did the agent finish the task?	Binary success/failure
Tool accuracy	Correct tool called with correct args?	Compare to gold standard
Step efficiency	Number of steps to complete	Count tool calls
Cost	Total tokens consumed	Sum input + output tokens
Latency	Time to complete task	Wall clock time
Safety	No harmful actions taken	Red-team testing
User satisfaction	Did the user get what they needed?	Thumbs up/down, CSAT

Common Pitfalls

Most agent failures come from architectural over-engineering or missing safety boundaries, not from the LLM itself. Check this list before adding another agent to your system.

Pitfall	Problem	Fix
Too many tools	LLM confused, wrong tool selection	Limit to 10-15 tools, use routing
Vague tool descriptions	Wrong tool calls	Write precise descriptions with examples
No max iteration limit	Infinite loops, cost explosion	Set hard limit (5-20 steps)
Full history in context	Token overflow, high cost	Use summary or vector memory
No tool result validation	Garbage in, garbage out	Validate tool outputs before passing to LLM
Single monolithic agent	Poor at specialized tasks	Split into specialist agents
No human escalation path	Agent stuck on hard cases	Add "escalate_to_human" tool
Ignoring tool errors	Agent continues with bad data	Surface errors clearly to the LLM