Chapter 12 of 20

Multi-Hop Reasoning

Who approved the vendor that supplies the component that failed the safety test? That is 4 hops. This chapter builds the full multi-hop reasoning pipeline: question decomposition, chain-of-traversal with intermediate results, single-query vs step-by-step tradeoffs, a complete reasoning agent with traceable output, and depth limit guidance by use case.

12 min read

Overview

Who approved the vendor that supplies the component that failed the safety test? That is 4 hops.

01. What Multi-Hop Means

Diagram 1

A hop is a single relationship traversal. "Who manages Alice?" is 1 hop. You follow one MANAGES relationship from Alice to her manager. "Who manages Alice's manager?" is 2 hops. "Who approved the vendor that supplies the component that failed the safety test?" is 4 hops:

SafetyTest --[FAILED]--> Component --[SUPPLIED_BY]--> Vendor
    --[APPROVED_BY]--> Approver

Multi-hop questions are the questions knowledge graphs were built to answer. They are also the questions that LLMs handle worst on their own. Each hop requires looking up specific data that no model has in its weights.

02. Why This Is Hard for LLMs Alone

LLMs without graph access attempt multi-hop reasoning in one of two ways. Both are unreliable.

Guessing the chain. The LLM tries to answer the full chain from training data. "Who approved the vendor that supplies the failed component?" It generates a plausible-sounding answer: "Based on typical procurement processes, the VP of Supply Chain likely approved..." Fabrication, not reasoning. It has no access to your specific approval records.

Partial retrieval. With standard RAG, the LLM retrieves chunks related to the question. It might find a chunk mentioning the failed component and another mentioning vendor approvals. But the chunks do not connect — they come from different documents, different time periods, different contexts. The connection exists in the graph structure, not in any single text passage.

The graph makes multi-hop reasoning deterministic. Each hop is a concrete traversal with a verifiable result:

Hop 1: MATCH (t:Event {name: "Safety Test ST-2024-019"})
           -[:CAUSED_BY]->(c:Component)
        → Component: "Valve X-200"

Hop 2: MATCH (c:Component {name: "Valve X-200"})
           <-[:SUPPLIES]-(v:Organization)
        → Vendor: "Acme Industrial"

Hop 3: MATCH (v:Organization {name: "Acme Industrial"})
           <-[:APPROVED_BY]-(d:Document)
        → Document: "Vendor Approval VA-2023-088"

Hop 4: MATCH (d:Document {name: "Vendor Approval VA-2023-088"})
           -[:APPROVED_BY]->(p:Person)
        → Person: "Sarah Chen, VP Procurement"

Each hop is verifiable. Each result is traceable. The final answer comes with a complete chain of evidence.

03. Decomposing Multi-Hop Questions

The first challenge is breaking a natural language question into individual hops. The LLM handles the language; the graph handles the data lookups.

import anthropic
import json

client = anthropic.Anthropic()

DECOMPOSITION_PROMPT = """You are a question decomposer. Given a
complex question, break it into a sequence of simple sub-questions
where each sub-question can be answered with a single graph
traversal.

GRAPH SCHEMA:
- Person (name, title, department)
- Organization (name, org_type, status)
- Component (name, part_number, category, status)
- Document (name, doc_type, effective_date)
- Event (name, date, event_type, severity)
- Regulation (name, jurisdiction)
- Location (name, city, state)

RELATIONSHIPS:
- WORKS_AT: Person -> Organization
- MANAGES: Person -> Person
- REPORTS_TO: Person -> Person
- SUPPLIES: Organization -> Component
- APPROVED_BY: Document/Component -> Person
- COMPLIES_WITH: Component -> Regulation
- LOCATED_IN: Organization/Person -> Location
- CAUSED_BY: Event -> Component/Event
- DEPENDS_ON: Component -> Component
- REFERENCES: Document -> Document

RULES:
1. Each sub-question should reference the result of a previous
   sub-question using {{step_N}} notation.
2. The final sub-question should produce the answer.
3. Include the expected entity type for each answer.

QUESTION: {question}

Return JSON:
{{
  "steps": [
    {{
      "step": 1,
      "question": "simple sub-question",
      "expected_type": "Person|Organization|Component|...",
      "cypher_hint": "relationship pattern to use"
    }}
  ]
}}"""


def decompose_question(question: str) -> list[dict]:
    """Break a multi-hop question into sub-questions."""
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2048,
        messages=[{
            "role": "user",
            "content": DECOMPOSITION_PROMPT.format(
                question=question
            )
        }]
    )

    text = response.content[0].text
    if "```json" in text:
        text = text.split("```json")[1].split("```")[0]
    elif "```" in text:
        text = text.split("```")[1].split("```")[0]

    result = json.loads(text)
    return result["steps"]

Example Decomposition

Question: "Which regulations apply to the components supplied by the vendor that was flagged in last month's audit?"

Decomposition:

StepSub-questionExpected TypeCypher Hint
1Which audit event from last month flagged a vendor?Event, Organization(e:Event)-[:AFFECTED]->(v:Organization)
2Which components does supply?Component(v:Organization)-[:SUPPLIES]->(c:Component)
3Which regulations do need to comply with?Regulation(c:Component)-[:COMPLIES_WITH]->(r:Regulation)

04. Chain-of-Traversal: Step-by-Step Graph Walking

Chain-of-traversal is the graph equivalent of chain-of-thought. Instead of reasoning through logic steps, the agent walks through graph steps — executing one traversal, observing the result, then executing the next traversal using that result.

from neo4j import GraphDatabase
from dataclasses import dataclass, field

driver = GraphDatabase.driver(
    "bolt://localhost:7687",
    auth=("neo4j", "password")
)


@dataclass
class TraversalStep:
    """One step in a chain-of-traversal."""
    step_number: int
    question: str
    cypher_query: str
    results: list[dict] = field(default_factory=list)
    interpretation: str = ""


@dataclass
class TraversalChain:
    """A complete multi-hop reasoning chain."""
    original_question: str
    steps: list[TraversalStep] = field(default_factory=list)
    final_answer: str = ""


def generate_cypher_for_step(
    step: dict,
    previous_results: dict
) -> str:
    """Generate a Cypher query for a decomposed step,
    substituting results from previous steps."""
    # Build context from previous results
    context_parts = []
    for step_num, result in previous_results.items():
        entities = [r.get("name", str(r)) for r in result]
        context_parts.append(
            f"Step {step_num} found: {', '.join(entities)}"
        )

    context = "\n".join(context_parts) if context_parts else "None"

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=500,
        messages=[{
            "role": "user",
            "content": f"""Generate a Cypher query for this step.

Previous results:
{context}

Current step: {step['question']}
Expected result type: {step['expected_type']}
Hint: {step.get('cypher_hint', 'none')}

Return ONLY the Cypher query, no explanation.
Use the entity names from previous results where needed.
Always RETURN name and relevant properties.
Add LIMIT 20."""
        }]
    )

    query = response.content[0].text.strip()
    if query.startswith("```"):
        query = query.split("```")[1]
        if query.startswith("cypher"):
            query = query[6:]
        query = query.strip()
    return query


def execute_chain(question: str) -> TraversalChain:
    """Execute a full multi-hop reasoning chain."""
    chain = TraversalChain(original_question=question)

    # Step 1: Decompose the question
    steps = decompose_question(question)
    print(f"Decomposed into {len(steps)} steps\n")

    previous_results = {}

    for step_def in steps:
        step_num = step_def["step"]
        print(f"--- Step {step_num}: {step_def['question']} ---")

        # Generate Cypher for this step
        cypher = generate_cypher_for_step(
            step_def, previous_results
        )
        print(f"Cypher: {cypher}")

        # Execute
        step = TraversalStep(
            step_number=step_num,
            question=step_def["question"],
            cypher_query=cypher
        )

        try:
            with driver.session() as session:
                result = session.run(cypher)
                records = [dict(r) for r in result]
                step.results = records
                previous_results[step_num] = records
                print(f"Results: {len(records)} records")
                for r in records[:5]:
                    print(f"  {r}")
        except Exception as e:
            step.results = []
            step.interpretation = f"Query failed: {e}"
            print(f"Error: {e}")

        chain.steps.append(step)
        print()

    # Synthesize final answer
    chain.final_answer = synthesize_answer(chain)
    return chain


def synthesize_answer(chain: TraversalChain) -> str:
    """Synthesize a natural language answer from the
    traversal chain."""
    # Build a summary of all steps and results
    step_summaries = []
    for step in chain.steps:
        results_text = "\n".join(
            f"  - {r}" for r in step.results[:10]
        )
        step_summaries.append(
            f"Step {step.step_number}: {step.question}\n"
            f"Query: {step.cypher_query}\n"
            f"Results:\n{results_text}"
        )

    all_steps = "\n\n".join(step_summaries)

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"""Based on this chain of graph traversals,
provide a clear, complete answer to the original question.

Original question: {chain.original_question}

Traversal chain:
{all_steps}

Provide:
1. A direct answer to the question
2. The evidence chain (how each step connects to the next)
3. Any caveats or limitations in the data"""
        }]
    )

    return response.content[0].text

05. Combining Graph Paths with LLM Synthesis

Sometimes you do not need to decompose. When the path is predictable, you can traverse the full path in a single Cypher query and use the LLM to interpret the result.

def single_query_multi_hop(
    question: str
) -> str:
    """For questions where the full path can be queried at once."""
    # Ask the LLM to generate a single multi-hop Cypher query
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=500,
        messages=[{
            "role": "user",
            "content": f"""Generate a single Cypher query that
answers this question by traversing multiple relationships.

SCHEMA:
- Person, Organization, Component, Document, Event, Regulation,
  Location
- WORKS_AT, MANAGES, REPORTS_TO, SUPPLIES, APPROVED_BY,
  COMPLIES_WITH, LOCATED_IN, CAUSED_BY, DEPENDS_ON, REFERENCES

Question: {question}

Return ONLY the Cypher query.
Use variable-length paths where the depth is unknown.
LIMIT to 20 results."""
        }]
    )

    cypher = response.content[0].text.strip()
    if "```" in cypher:
        cypher = cypher.split("```")[1]
        if cypher.startswith("cypher"):
            cypher = cypher[6:]
        cypher = cypher.strip()

    print(f"Query: {cypher}\n")

    # Execute
    with driver.session() as session:
        try:
            result = session.run(cypher)
            records = [dict(r) for r in result]
        except Exception as e:
            return f"Query failed: {e}. Falling back to " \
                   f"step-by-step decomposition."

    if not records:
        return "No results found for this traversal."

    # Format results for LLM interpretation
    results_text = "\n".join(
        str(r) for r in records[:20]
    )

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"""Based on these graph traversal results,
answer the original question in plain language.

Question: {question}
Results: {results_text}

Provide a clear answer with the evidence chain."""
        }]
    )

    return response.content[0].text

When to Decompose vs. Single Query

ApproachWhen to UseAdvantageRisk
Single Cypher queryPath is predictable, schema is simpleFaster (one query)LLM may generate invalid Cypher
Step-by-step decompositionPath is unknown, logic branches, or depth variesMore reliable, traceableSlower (multiple queries + LLM calls)
HybridStart with single query, fall back to decomposition on failureBest of bothMore complex to implement

06. The Multi-Hop Reasoning Agent

A complete agent that handles multi-hop questions with a step-by-step trace:

from dataclasses import dataclass


@dataclass
class ReasoningTrace:
    """Complete trace of multi-hop reasoning."""
    question: str
    hop_count: int
    steps: list[dict]
    answer: str
    confidence: str


def multi_hop_agent(question: str) -> ReasoningTrace:
    """Full multi-hop reasoning agent with trace."""

    # Classify the question complexity
    complexity = classify_complexity(question)
    print(f"Question complexity: {complexity['hops']} hops "
          f"({complexity['strategy']})\n")

    if complexity["hops"] <= 1:
        # Simple lookup — no decomposition needed
        return simple_lookup(question)

    if complexity["strategy"] == "single_query":
        # Try single Cypher first
        answer = single_query_multi_hop(question)
        if "Query failed" not in answer:
            return ReasoningTrace(
                question=question,
                hop_count=complexity["hops"],
                steps=[{"type": "single_query",
                        "answer": answer}],
                answer=answer,
                confidence="high"
            )

    # Fall back to decomposition
    chain = execute_chain(question)

    return ReasoningTrace(
        question=question,
        hop_count=len(chain.steps),
        steps=[
            {
                "step": s.step_number,
                "question": s.question,
                "query": s.cypher_query,
                "results": s.results[:5],
                "result_count": len(s.results)
            }
            for s in chain.steps
        ],
        answer=chain.final_answer,
        confidence="high" if all(
            s.results for s in chain.steps
        ) else "low"
    )


def classify_complexity(question: str) -> dict:
    """Estimate question complexity and best strategy."""
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=200,
        messages=[{
            "role": "user",
            "content": f"""Estimate the graph traversal complexity
of this question.

Question: "{question}"

Return JSON:
{{
  "hops": <estimated number of relationship traversals>,
  "strategy": "simple_lookup|single_query|decompose",
  "reasoning": "<one sentence>"
}}"""
        }]
    )

    text = response.content[0].text
    if "```json" in text:
        text = text.split("```json")[1].split("```")[0]
    elif "```" in text:
        text = text.split("```")[1].split("```")[0]

    return json.loads(text)


def simple_lookup(question: str) -> ReasoningTrace:
    """Handle 1-hop questions with a direct query."""
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=300,
        messages=[{
            "role": "user",
            "content": f"""Generate a simple Cypher query for:
"{question}"

Schema: Person, Organization, Component, Document, Event,
Regulation, Location with standard relationships.

Return ONLY the Cypher query."""
        }]
    )

    cypher = response.content[0].text.strip()
    if "```" in cypher:
        cypher = cypher.split("```")[1]
        if cypher.startswith("cypher"):
            cypher = cypher[6:]
        cypher = cypher.strip()

    with driver.session() as session:
        try:
            result = session.run(cypher)
            records = [dict(r) for r in result]
        except Exception as e:
            records = []

    answer = "\n".join(str(r) for r in records) if records \
        else "No results found."

    return ReasoningTrace(
        question=question,
        hop_count=1,
        steps=[{
            "step": 1,
            "query": cypher,
            "results": records[:10],
            "result_count": len(records)
        }],
        answer=answer,
        confidence="high" if records else "low"
    )

07. Performance Considerations

Multi-hop queries can be expensive. The number of paths explored grows exponentially with depth:

DepthAvg Degree 5Avg Degree 10Avg Degree 50
1 hop5 paths10 paths50 paths
2 hops25 paths100 paths2,500 paths
3 hops125 paths1,000 paths125,000 paths
4 hops625 paths10,000 paths6.25M paths
5 hops3,125 paths100,000 paths312M paths

Set hard depth limits based on your use case:

# Recommended depth limits by use case
DEPTH_LIMITS = {
    "org_hierarchy": 10,    # Org charts can be deep but narrow
    "supply_chain": 5,      # Supply chains fan out quickly
    "dependency_graph": 4,  # IT dependencies cascade fast
    "approval_chain": 6,    # Approval chains are linear
    "impact_analysis": 3,   # Impact fans out exponentially
    "general": 4            # Default for unknown patterns
}

Caching Intermediate Results

When the same entities appear in multiple questions, cache their neighborhoods:

from functools import lru_cache
from datetime import datetime, timedelta

# Simple in-memory cache with TTL
_cache = {}
_cache_ttl = timedelta(minutes=30)


def cached_neighborhood(
    entity_name: str,
    depth: int = 2
) -> list[dict]:
    """Cache the neighborhood of frequently queried entities."""
    cache_key = f"{entity_name}:{depth}"

    if cache_key in _cache:
        result, timestamp = _cache[cache_key]
        if datetime.now() - timestamp < _cache_ttl:
            return result

    with driver.session() as session:
        result = session.run("""
            MATCH (n)
            WHERE toLower(n.name) = toLower($name)
            WITH n LIMIT 1
            MATCH path = (n)-[*1..{depth}]-(connected)
            RETURN connected.name AS name,
                   labels(connected)[0] AS type,
                   min(length(path)) AS distance,
                   [r IN relationships(path) |
                    type(r)] AS via
            ORDER BY distance
            LIMIT 100
        """.format(depth=depth), name=entity_name)

        records = [dict(r) for r in result]

    _cache[cache_key] = (records, datetime.now())
    return records

Pruning Traversals

When traversing, restrict to relevant relationship types to narrow the search space:

// Instead of traversing ALL relationships:
MATCH path = (start)-[*1..4]-(end)

// Traverse only relevant relationship types:
MATCH path = (start)
  -[:SUPPLIES|DEPENDS_ON|CONTAINS*1..4]-
  (end)

// Or prune by node properties:
MATCH path = (start)-[*1..4]-(end)
WHERE ALL(n IN nodes(path) WHERE n.status <> 'inactive')

08. Hop Count by Use Case

HopsUse CaseExample QuestionTypical Latency
1Direct lookup"Who manages the NYC office?"< 100ms
2One-step chain"Which vendors supply parts for Product X?"< 200ms
2Peer query"Who else reports to Alice's manager?"< 200ms
3Supply trace"Where are the components in Building 7 manufactured?"< 500ms
3Approval chain"Who approved the contract for Vendor X?"< 500ms
4Root cause"Who approved the vendor that supplies the failed part?"< 1s
4Impact cascade"If Supplier A fails, which products are affected?"< 1s
5Compliance trace"Which regulations apply to the components in the product that Customer Y bought?"1-3s
6Full investigation"Trace the complete chain from safety incident to the person who approved the original vendor selection"2-5s
6+Cross-domain analysis"Which customers are at risk from the regulatory change that affects components from the supplier being investigated for fraud?"5-15s

09. Putting It All Together

# Example: Full trace from incident to responsible party
question = (
    "Who approved the vendor that supplies the component "
    "that caused safety incident SI-2024-003?"
)

trace = multi_hop_agent(question)

print(f"Question: {trace.question}")
print(f"Hops: {trace.hop_count}")
print(f"Confidence: {trace.confidence}")
print(f"\nReasoning chain:")
for step in trace.steps:
    print(f"\n  Step {step['step']}: {step.get('question', '')}")
    print(f"  Query: {step['query']}")
    print(f"  Found: {step['result_count']} results")
    for r in step.get('results', [])[:3]:
        print(f"    - {r}")

print(f"\nAnswer: {trace.answer}")

Expected output:

Question: Who approved the vendor that supplies the component
  that caused safety incident SI-2024-003?
Hops: 4
Confidence: high

Reasoning chain:

  Step 1: Which component caused safety incident SI-2024-003?
  Query: MATCH (e:Event {name: 'SI-2024-003'})-[:CAUSED_BY]->
         (c:Component) RETURN c.name AS name
  Found: 1 results
    - {'name': 'Valve X-200'}

  Step 2: Which vendor supplies the Valve X-200?
  Query: MATCH (v:Organization)-[:SUPPLIES]->
         (c:Component {name: 'Valve X-200'}) RETURN v.name AS name
  Found: 1 results
    - {'name': 'Acme Industrial'}

  Step 3: Which approval document covers Acme Industrial?
  Query: MATCH (d:Document)-[:REFERENCES]->
         (v:Organization {name: 'Acme Industrial'})
         WHERE d.doc_type = 'vendor_approval'
         RETURN d.name AS name
  Found: 1 results
    - {'name': 'Vendor Approval VA-2023-088'}

  Step 4: Who approved document VA-2023-088?
  Query: MATCH (d:Document {name: 'Vendor Approval VA-2023-088'})
         -[:APPROVED_BY]->(p:Person) RETURN p.name AS name,
         p.title AS title
  Found: 1 results
    - {'name': 'Sarah Chen', 'title': 'VP Procurement'}

Answer: Sarah Chen (VP Procurement) approved Acme Industrial
as a vendor (via Vendor Approval VA-2023-088). Acme Industrial
supplies the Valve X-200, which was the component that caused
safety incident SI-2024-003.

The complete evidence chain:
SI-2024-003 → Valve X-200 → Acme Industrial → VA-2023-088
→ Sarah Chen

10. Chapter Checklist

Before moving to the next chapter, make sure you can answer these questions:

  • Can you explain what multi-hop reasoning is and why LLMs cannot do it alone?
  • Can you decompose a multi-hop question into single-hop sub-questions?
  • Can you implement chain-of-traversal reasoning with intermediate results?
  • Can you decide when to use a single Cypher query vs. step-by-step decomposition?
  • Can you set appropriate depth limits for different use cases?
  • Can you build a complete multi-hop agent with a step-by-step trace?

Multi-hop reasoning is where knowledge graphs and LLMs complement each other most powerfully. The graph provides deterministic, traceable traversal through connected data. The LLM provides natural language understanding, question decomposition, and answer synthesis. Together they answer questions that neither could handle alone.