Chapter 12 of 20
Multi-Hop Reasoning
Who approved the vendor that supplies the component that failed the safety test? That is 4 hops. This chapter builds the full multi-hop reasoning pipeline: question decomposition, chain-of-traversal with intermediate results, single-query vs step-by-step tradeoffs, a complete reasoning agent with traceable output, and depth limit guidance by use case.
Overview
Who approved the vendor that supplies the component that failed the safety test? That is 4 hops.
01. What Multi-Hop Means
A hop is a single relationship traversal. "Who manages Alice?" is 1 hop. You follow one MANAGES relationship from Alice to her manager. "Who manages Alice's manager?" is 2 hops. "Who approved the vendor that supplies the component that failed the safety test?" is 4 hops:
SafetyTest --[FAILED]--> Component --[SUPPLIED_BY]--> Vendor
--[APPROVED_BY]--> Approver
Multi-hop questions are the questions knowledge graphs were built to answer. They are also the questions that LLMs handle worst on their own. Each hop requires looking up specific data that no model has in its weights.
02. Why This Is Hard for LLMs Alone
LLMs without graph access attempt multi-hop reasoning in one of two ways. Both are unreliable.
Guessing the chain. The LLM tries to answer the full chain from training data. "Who approved the vendor that supplies the failed component?" It generates a plausible-sounding answer: "Based on typical procurement processes, the VP of Supply Chain likely approved..." Fabrication, not reasoning. It has no access to your specific approval records.
Partial retrieval. With standard RAG, the LLM retrieves chunks related to the question. It might find a chunk mentioning the failed component and another mentioning vendor approvals. But the chunks do not connect — they come from different documents, different time periods, different contexts. The connection exists in the graph structure, not in any single text passage.
The graph makes multi-hop reasoning deterministic. Each hop is a concrete traversal with a verifiable result:
Hop 1: MATCH (t:Event {name: "Safety Test ST-2024-019"})
-[:CAUSED_BY]->(c:Component)
→ Component: "Valve X-200"
Hop 2: MATCH (c:Component {name: "Valve X-200"})
<-[:SUPPLIES]-(v:Organization)
→ Vendor: "Acme Industrial"
Hop 3: MATCH (v:Organization {name: "Acme Industrial"})
<-[:APPROVED_BY]-(d:Document)
→ Document: "Vendor Approval VA-2023-088"
Hop 4: MATCH (d:Document {name: "Vendor Approval VA-2023-088"})
-[:APPROVED_BY]->(p:Person)
→ Person: "Sarah Chen, VP Procurement"
Each hop is verifiable. Each result is traceable. The final answer comes with a complete chain of evidence.
03. Decomposing Multi-Hop Questions
The first challenge is breaking a natural language question into individual hops. The LLM handles the language; the graph handles the data lookups.
import anthropic
import json
client = anthropic.Anthropic()
DECOMPOSITION_PROMPT = """You are a question decomposer. Given a
complex question, break it into a sequence of simple sub-questions
where each sub-question can be answered with a single graph
traversal.
GRAPH SCHEMA:
- Person (name, title, department)
- Organization (name, org_type, status)
- Component (name, part_number, category, status)
- Document (name, doc_type, effective_date)
- Event (name, date, event_type, severity)
- Regulation (name, jurisdiction)
- Location (name, city, state)
RELATIONSHIPS:
- WORKS_AT: Person -> Organization
- MANAGES: Person -> Person
- REPORTS_TO: Person -> Person
- SUPPLIES: Organization -> Component
- APPROVED_BY: Document/Component -> Person
- COMPLIES_WITH: Component -> Regulation
- LOCATED_IN: Organization/Person -> Location
- CAUSED_BY: Event -> Component/Event
- DEPENDS_ON: Component -> Component
- REFERENCES: Document -> Document
RULES:
1. Each sub-question should reference the result of a previous
sub-question using {{step_N}} notation.
2. The final sub-question should produce the answer.
3. Include the expected entity type for each answer.
QUESTION: {question}
Return JSON:
{{
"steps": [
{{
"step": 1,
"question": "simple sub-question",
"expected_type": "Person|Organization|Component|...",
"cypher_hint": "relationship pattern to use"
}}
]
}}"""
def decompose_question(question: str) -> list[dict]:
"""Break a multi-hop question into sub-questions."""
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[{
"role": "user",
"content": DECOMPOSITION_PROMPT.format(
question=question
)
}]
)
text = response.content[0].text
if "```json" in text:
text = text.split("```json")[1].split("```")[0]
elif "```" in text:
text = text.split("```")[1].split("```")[0]
result = json.loads(text)
return result["steps"]
Example Decomposition
Question: "Which regulations apply to the components supplied by the vendor that was flagged in last month's audit?"
Decomposition:
| Step | Sub-question | Expected Type | Cypher Hint |
|---|---|---|---|
| 1 | Which audit event from last month flagged a vendor? | Event, Organization | (e:Event)-[:AFFECTED]->(v:Organization) |
| 2 | Which components does supply? | Component | (v:Organization)-[:SUPPLIES]->(c:Component) |
| 3 | Which regulations do need to comply with? | Regulation | (c:Component)-[:COMPLIES_WITH]->(r:Regulation) |
04. Chain-of-Traversal: Step-by-Step Graph Walking
Chain-of-traversal is the graph equivalent of chain-of-thought. Instead of reasoning through logic steps, the agent walks through graph steps — executing one traversal, observing the result, then executing the next traversal using that result.
from neo4j import GraphDatabase
from dataclasses import dataclass, field
driver = GraphDatabase.driver(
"bolt://localhost:7687",
auth=("neo4j", "password")
)
@dataclass
class TraversalStep:
"""One step in a chain-of-traversal."""
step_number: int
question: str
cypher_query: str
results: list[dict] = field(default_factory=list)
interpretation: str = ""
@dataclass
class TraversalChain:
"""A complete multi-hop reasoning chain."""
original_question: str
steps: list[TraversalStep] = field(default_factory=list)
final_answer: str = ""
def generate_cypher_for_step(
step: dict,
previous_results: dict
) -> str:
"""Generate a Cypher query for a decomposed step,
substituting results from previous steps."""
# Build context from previous results
context_parts = []
for step_num, result in previous_results.items():
entities = [r.get("name", str(r)) for r in result]
context_parts.append(
f"Step {step_num} found: {', '.join(entities)}"
)
context = "\n".join(context_parts) if context_parts else "None"
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=500,
messages=[{
"role": "user",
"content": f"""Generate a Cypher query for this step.
Previous results:
{context}
Current step: {step['question']}
Expected result type: {step['expected_type']}
Hint: {step.get('cypher_hint', 'none')}
Return ONLY the Cypher query, no explanation.
Use the entity names from previous results where needed.
Always RETURN name and relevant properties.
Add LIMIT 20."""
}]
)
query = response.content[0].text.strip()
if query.startswith("```"):
query = query.split("```")[1]
if query.startswith("cypher"):
query = query[6:]
query = query.strip()
return query
def execute_chain(question: str) -> TraversalChain:
"""Execute a full multi-hop reasoning chain."""
chain = TraversalChain(original_question=question)
# Step 1: Decompose the question
steps = decompose_question(question)
print(f"Decomposed into {len(steps)} steps\n")
previous_results = {}
for step_def in steps:
step_num = step_def["step"]
print(f"--- Step {step_num}: {step_def['question']} ---")
# Generate Cypher for this step
cypher = generate_cypher_for_step(
step_def, previous_results
)
print(f"Cypher: {cypher}")
# Execute
step = TraversalStep(
step_number=step_num,
question=step_def["question"],
cypher_query=cypher
)
try:
with driver.session() as session:
result = session.run(cypher)
records = [dict(r) for r in result]
step.results = records
previous_results[step_num] = records
print(f"Results: {len(records)} records")
for r in records[:5]:
print(f" {r}")
except Exception as e:
step.results = []
step.interpretation = f"Query failed: {e}"
print(f"Error: {e}")
chain.steps.append(step)
print()
# Synthesize final answer
chain.final_answer = synthesize_answer(chain)
return chain
def synthesize_answer(chain: TraversalChain) -> str:
"""Synthesize a natural language answer from the
traversal chain."""
# Build a summary of all steps and results
step_summaries = []
for step in chain.steps:
results_text = "\n".join(
f" - {r}" for r in step.results[:10]
)
step_summaries.append(
f"Step {step.step_number}: {step.question}\n"
f"Query: {step.cypher_query}\n"
f"Results:\n{results_text}"
)
all_steps = "\n\n".join(step_summaries)
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"""Based on this chain of graph traversals,
provide a clear, complete answer to the original question.
Original question: {chain.original_question}
Traversal chain:
{all_steps}
Provide:
1. A direct answer to the question
2. The evidence chain (how each step connects to the next)
3. Any caveats or limitations in the data"""
}]
)
return response.content[0].text
05. Combining Graph Paths with LLM Synthesis
Sometimes you do not need to decompose. When the path is predictable, you can traverse the full path in a single Cypher query and use the LLM to interpret the result.
def single_query_multi_hop(
question: str
) -> str:
"""For questions where the full path can be queried at once."""
# Ask the LLM to generate a single multi-hop Cypher query
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=500,
messages=[{
"role": "user",
"content": f"""Generate a single Cypher query that
answers this question by traversing multiple relationships.
SCHEMA:
- Person, Organization, Component, Document, Event, Regulation,
Location
- WORKS_AT, MANAGES, REPORTS_TO, SUPPLIES, APPROVED_BY,
COMPLIES_WITH, LOCATED_IN, CAUSED_BY, DEPENDS_ON, REFERENCES
Question: {question}
Return ONLY the Cypher query.
Use variable-length paths where the depth is unknown.
LIMIT to 20 results."""
}]
)
cypher = response.content[0].text.strip()
if "```" in cypher:
cypher = cypher.split("```")[1]
if cypher.startswith("cypher"):
cypher = cypher[6:]
cypher = cypher.strip()
print(f"Query: {cypher}\n")
# Execute
with driver.session() as session:
try:
result = session.run(cypher)
records = [dict(r) for r in result]
except Exception as e:
return f"Query failed: {e}. Falling back to " \
f"step-by-step decomposition."
if not records:
return "No results found for this traversal."
# Format results for LLM interpretation
results_text = "\n".join(
str(r) for r in records[:20]
)
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"""Based on these graph traversal results,
answer the original question in plain language.
Question: {question}
Results: {results_text}
Provide a clear answer with the evidence chain."""
}]
)
return response.content[0].text
When to Decompose vs. Single Query
| Approach | When to Use | Advantage | Risk |
|---|---|---|---|
| Single Cypher query | Path is predictable, schema is simple | Faster (one query) | LLM may generate invalid Cypher |
| Step-by-step decomposition | Path is unknown, logic branches, or depth varies | More reliable, traceable | Slower (multiple queries + LLM calls) |
| Hybrid | Start with single query, fall back to decomposition on failure | Best of both | More complex to implement |
06. The Multi-Hop Reasoning Agent
A complete agent that handles multi-hop questions with a step-by-step trace:
from dataclasses import dataclass
@dataclass
class ReasoningTrace:
"""Complete trace of multi-hop reasoning."""
question: str
hop_count: int
steps: list[dict]
answer: str
confidence: str
def multi_hop_agent(question: str) -> ReasoningTrace:
"""Full multi-hop reasoning agent with trace."""
# Classify the question complexity
complexity = classify_complexity(question)
print(f"Question complexity: {complexity['hops']} hops "
f"({complexity['strategy']})\n")
if complexity["hops"] <= 1:
# Simple lookup — no decomposition needed
return simple_lookup(question)
if complexity["strategy"] == "single_query":
# Try single Cypher first
answer = single_query_multi_hop(question)
if "Query failed" not in answer:
return ReasoningTrace(
question=question,
hop_count=complexity["hops"],
steps=[{"type": "single_query",
"answer": answer}],
answer=answer,
confidence="high"
)
# Fall back to decomposition
chain = execute_chain(question)
return ReasoningTrace(
question=question,
hop_count=len(chain.steps),
steps=[
{
"step": s.step_number,
"question": s.question,
"query": s.cypher_query,
"results": s.results[:5],
"result_count": len(s.results)
}
for s in chain.steps
],
answer=chain.final_answer,
confidence="high" if all(
s.results for s in chain.steps
) else "low"
)
def classify_complexity(question: str) -> dict:
"""Estimate question complexity and best strategy."""
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=200,
messages=[{
"role": "user",
"content": f"""Estimate the graph traversal complexity
of this question.
Question: "{question}"
Return JSON:
{{
"hops": <estimated number of relationship traversals>,
"strategy": "simple_lookup|single_query|decompose",
"reasoning": "<one sentence>"
}}"""
}]
)
text = response.content[0].text
if "```json" in text:
text = text.split("```json")[1].split("```")[0]
elif "```" in text:
text = text.split("```")[1].split("```")[0]
return json.loads(text)
def simple_lookup(question: str) -> ReasoningTrace:
"""Handle 1-hop questions with a direct query."""
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=300,
messages=[{
"role": "user",
"content": f"""Generate a simple Cypher query for:
"{question}"
Schema: Person, Organization, Component, Document, Event,
Regulation, Location with standard relationships.
Return ONLY the Cypher query."""
}]
)
cypher = response.content[0].text.strip()
if "```" in cypher:
cypher = cypher.split("```")[1]
if cypher.startswith("cypher"):
cypher = cypher[6:]
cypher = cypher.strip()
with driver.session() as session:
try:
result = session.run(cypher)
records = [dict(r) for r in result]
except Exception as e:
records = []
answer = "\n".join(str(r) for r in records) if records \
else "No results found."
return ReasoningTrace(
question=question,
hop_count=1,
steps=[{
"step": 1,
"query": cypher,
"results": records[:10],
"result_count": len(records)
}],
answer=answer,
confidence="high" if records else "low"
)
07. Performance Considerations
Multi-hop queries can be expensive. The number of paths explored grows exponentially with depth:
| Depth | Avg Degree 5 | Avg Degree 10 | Avg Degree 50 |
|---|---|---|---|
| 1 hop | 5 paths | 10 paths | 50 paths |
| 2 hops | 25 paths | 100 paths | 2,500 paths |
| 3 hops | 125 paths | 1,000 paths | 125,000 paths |
| 4 hops | 625 paths | 10,000 paths | 6.25M paths |
| 5 hops | 3,125 paths | 100,000 paths | 312M paths |
Set hard depth limits based on your use case:
# Recommended depth limits by use case
DEPTH_LIMITS = {
"org_hierarchy": 10, # Org charts can be deep but narrow
"supply_chain": 5, # Supply chains fan out quickly
"dependency_graph": 4, # IT dependencies cascade fast
"approval_chain": 6, # Approval chains are linear
"impact_analysis": 3, # Impact fans out exponentially
"general": 4 # Default for unknown patterns
}
Caching Intermediate Results
When the same entities appear in multiple questions, cache their neighborhoods:
from functools import lru_cache
from datetime import datetime, timedelta
# Simple in-memory cache with TTL
_cache = {}
_cache_ttl = timedelta(minutes=30)
def cached_neighborhood(
entity_name: str,
depth: int = 2
) -> list[dict]:
"""Cache the neighborhood of frequently queried entities."""
cache_key = f"{entity_name}:{depth}"
if cache_key in _cache:
result, timestamp = _cache[cache_key]
if datetime.now() - timestamp < _cache_ttl:
return result
with driver.session() as session:
result = session.run("""
MATCH (n)
WHERE toLower(n.name) = toLower($name)
WITH n LIMIT 1
MATCH path = (n)-[*1..{depth}]-(connected)
RETURN connected.name AS name,
labels(connected)[0] AS type,
min(length(path)) AS distance,
[r IN relationships(path) |
type(r)] AS via
ORDER BY distance
LIMIT 100
""".format(depth=depth), name=entity_name)
records = [dict(r) for r in result]
_cache[cache_key] = (records, datetime.now())
return records
Pruning Traversals
When traversing, restrict to relevant relationship types to narrow the search space:
// Instead of traversing ALL relationships:
MATCH path = (start)-[*1..4]-(end)
// Traverse only relevant relationship types:
MATCH path = (start)
-[:SUPPLIES|DEPENDS_ON|CONTAINS*1..4]-
(end)
// Or prune by node properties:
MATCH path = (start)-[*1..4]-(end)
WHERE ALL(n IN nodes(path) WHERE n.status <> 'inactive')
08. Hop Count by Use Case
| Hops | Use Case | Example Question | Typical Latency |
|---|---|---|---|
| 1 | Direct lookup | "Who manages the NYC office?" | < 100ms |
| 2 | One-step chain | "Which vendors supply parts for Product X?" | < 200ms |
| 2 | Peer query | "Who else reports to Alice's manager?" | < 200ms |
| 3 | Supply trace | "Where are the components in Building 7 manufactured?" | < 500ms |
| 3 | Approval chain | "Who approved the contract for Vendor X?" | < 500ms |
| 4 | Root cause | "Who approved the vendor that supplies the failed part?" | < 1s |
| 4 | Impact cascade | "If Supplier A fails, which products are affected?" | < 1s |
| 5 | Compliance trace | "Which regulations apply to the components in the product that Customer Y bought?" | 1-3s |
| 6 | Full investigation | "Trace the complete chain from safety incident to the person who approved the original vendor selection" | 2-5s |
| 6+ | Cross-domain analysis | "Which customers are at risk from the regulatory change that affects components from the supplier being investigated for fraud?" | 5-15s |
09. Putting It All Together
# Example: Full trace from incident to responsible party
question = (
"Who approved the vendor that supplies the component "
"that caused safety incident SI-2024-003?"
)
trace = multi_hop_agent(question)
print(f"Question: {trace.question}")
print(f"Hops: {trace.hop_count}")
print(f"Confidence: {trace.confidence}")
print(f"\nReasoning chain:")
for step in trace.steps:
print(f"\n Step {step['step']}: {step.get('question', '')}")
print(f" Query: {step['query']}")
print(f" Found: {step['result_count']} results")
for r in step.get('results', [])[:3]:
print(f" - {r}")
print(f"\nAnswer: {trace.answer}")
Expected output:
Question: Who approved the vendor that supplies the component
that caused safety incident SI-2024-003?
Hops: 4
Confidence: high
Reasoning chain:
Step 1: Which component caused safety incident SI-2024-003?
Query: MATCH (e:Event {name: 'SI-2024-003'})-[:CAUSED_BY]->
(c:Component) RETURN c.name AS name
Found: 1 results
- {'name': 'Valve X-200'}
Step 2: Which vendor supplies the Valve X-200?
Query: MATCH (v:Organization)-[:SUPPLIES]->
(c:Component {name: 'Valve X-200'}) RETURN v.name AS name
Found: 1 results
- {'name': 'Acme Industrial'}
Step 3: Which approval document covers Acme Industrial?
Query: MATCH (d:Document)-[:REFERENCES]->
(v:Organization {name: 'Acme Industrial'})
WHERE d.doc_type = 'vendor_approval'
RETURN d.name AS name
Found: 1 results
- {'name': 'Vendor Approval VA-2023-088'}
Step 4: Who approved document VA-2023-088?
Query: MATCH (d:Document {name: 'Vendor Approval VA-2023-088'})
-[:APPROVED_BY]->(p:Person) RETURN p.name AS name,
p.title AS title
Found: 1 results
- {'name': 'Sarah Chen', 'title': 'VP Procurement'}
Answer: Sarah Chen (VP Procurement) approved Acme Industrial
as a vendor (via Vendor Approval VA-2023-088). Acme Industrial
supplies the Valve X-200, which was the component that caused
safety incident SI-2024-003.
The complete evidence chain:
SI-2024-003 → Valve X-200 → Acme Industrial → VA-2023-088
→ Sarah Chen
10. Chapter Checklist
Before moving to the next chapter, make sure you can answer these questions:
- Can you explain what multi-hop reasoning is and why LLMs cannot do it alone?
- Can you decompose a multi-hop question into single-hop sub-questions?
- Can you implement chain-of-traversal reasoning with intermediate results?
- Can you decide when to use a single Cypher query vs. step-by-step decomposition?
- Can you set appropriate depth limits for different use cases?
- Can you build a complete multi-hop agent with a step-by-step trace?
Multi-hop reasoning is where knowledge graphs and LLMs complement each other most powerfully. The graph provides deterministic, traceable traversal through connected data. The LLM provides natural language understanding, question decomposition, and answer synthesis. Together they answer questions that neither could handle alone.