Chapter 12 of 21

Hybrid Architectures — LLMs and LCMs Together

The enterprise AI stack that ships will not be LLMs or LCMs — it will be LLMs for token-level tasks and LCMs for concept-level tasks, with clean handoff points. Three patterns: the concept router, the concept elevator, and the concept pipeline.

8 min read

Part 3 — Enterprise Application

Hybrid Architectures — LLMs and LCMs Together

The LCM adoption question is not binary. The correct question is never "should we replace our LLMs with LCMs?" It is: for which components of which workflows does concept-level architecture produce better results, and how do we connect the two?

This chapter answers that question with three patterns. Each has a specific use case profile, an integration contract, a failure mode analysis, and a decision tree for choosing between them.

12.1 The Three Hybrid Patterns

The three patterns differ in where the LLM-LCM boundary sits and what crosses it.

Pattern 1: The Concept Router. An LLM classifier analyzes incoming requests and routes them to either an LLM execution layer or an LCM execution layer, based on the Task Unit Test from Chapter 5. The router makes a routing decision; the execution layer handles the actual task. Use this pattern when you have a mixed-task application that handles both token-level and concept-level requests from a single entry point.

Pattern 2: The Concept Elevator. An LLM handles user-facing interaction and short-form output generation. An LCM handles the reasoning core for concept-level tasks. The LLM "elevates" the user's query into concept space by formatting it for the LCM's encoder, and "lowers" the LCM's concept-level output back into natural language by polishing and contextualizing the decoded output. Use this pattern when you have a conversational application that occasionally needs concept-level reasoning.

Pattern 3: The Concept Pipeline. LCM output feeds directly into an LLM as context. The LCM performs concept-level reasoning — multi-document synthesis, cross-lingual comparison, contradiction detection — and its decoded output is passed to an LLM that formats, contextualizes, or responds to user questions about it. Use this pattern when you need concept-level reasoning as a preprocessing step for a downstream LLM application.

12.2 Pattern 1: The Concept Router

Hybrid Pattern 1 — Concept Router

Figure 12.1 — The Concept Router. An LLM classifier applies the Task Unit Test to incoming requests and routes to either the LLM or LCM execution path. Cost is paid for LCM only when the task genuinely warrants it.

When to use it. A knowledge management application serves two types of user requests: short-document Q&A (LLM-appropriate) and cross-document synthesis (LCM-appropriate). A single entry point must route both.

Architecture:

User Request
    ↓
[LLM Router] ← Task Unit Test classification
    ↓               ↓
[LLM Path]     [LCM Path]
    ↓               ↓
[LLM Response]  [LCM Response]
    ↓               ↓
[Response to User]

Integration contract. The router LLM classifies the request using a structured output schema. The classification includes: the identified task type (token-level or concept-level), the confidence score, and the parameters needed by the selected execution layer.

from pydantic import BaseModel
from typing import Literal

class RoutingDecision(BaseModel):
    task_type: Literal["token_level", "concept_level"]
    confidence: float
    execution_params: dict
    routing_reason: str

ROUTER_PROMPT = """
Classify this request as requiring token-level or concept-level processing.

Token-level: short document Q&A, code generation, conversational, single-document tasks.
Concept-level: cross-document comparison, cross-lingual synthesis, long-form generation,
hierarchical planning, contradiction detection.

Return a RoutingDecision JSON object.

Request: {user_request}
"""

def route_request(user_request: str, llm) -> RoutingDecision:
    response = llm.invoke(
        ROUTER_PROMPT.format(user_request=user_request),
        output_schema=RoutingDecision
    )
    return response

Failure modes.

Misclassification: The router sends a concept-level request to the LLM path. The LLM produces a plausible but globally incoherent response that the user accepts without realizing the concept-level task was not handled correctly. Mitigation: include a confidence threshold below which both paths run and results are compared; route low-confidence decisions to the LCM path by default. Concept-level over-routing is less harmful than concept-level under-routing.

Latency asymmetry: The LCM path is significantly slower than the LLM path. Users receiving LCM responses wait longer. Mitigation: stream LCM responses where possible; set user expectations with progress indicators for concept-level tasks.

Decision tree entry point. Use the concept router when your application handles mixed task types, has a single conversational entry point, and users should not need to know whether their request is being handled by an LLM or an LCM.

12.3 Pattern 2: The Concept Elevator

Hybrid Pattern 2 — Concept Elevator

Figure 12.2 — The Concept Elevator. An LLM manages the conversation at the token layer; an LCM handles heavy reasoning in concept space. The elevator and descender components bridge the two layers. Users experience a chat interface backed by concept-level reasoning.

When to use it. A customer-facing research assistant has a conversational interface but occasionally needs to synthesize across large document corpora. The base user experience is conversational; specific query types require concept-level reasoning.

Architecture:

User Message (conversational)
    ↓
[LLM — Conversation Manager]
    ↓ (detects concept-level need)
[LLM — Query Elevator: formats query for LCM encoder]
    ↓
[LCM — Concept Reasoning: encodes, reasons, decodes]
    ↓
[LLM — Response Polisher: contextualizes LCM output for conversation]
    ↓
[Response to User (conversational format)]

The elevator and polisher roles. The "elevator" LLM transforms the user's natural language query into a form optimized for the LCM encoder: explicit task instructions, document scope specification, output format requirements. The "polisher" LLM takes the LCM's decoded output — semantically correct but potentially terse or structured — and reformats it for the conversation context.

ELEVATOR_PROMPT = """
Transform this user request into a structured query for the document synthesis system.
The synthesis system needs: a clear synthesis goal, the document scope, and the output format.

User request: {user_request}
Available documents: {document_list}

Return a structured synthesis task specification.
"""

POLISHER_PROMPT = """
You are a research assistant. The user asked: "{user_request}"

Our document synthesis system produced this analysis:
{lcm_output}

Rephrase this analysis as a conversational response that:
1. Directly answers the user's question
2. Cites the most important findings
3. Flags any uncertainties or limitations
4. Invites follow-up questions
"""

def concept_elevator_pipeline(user_request: str, documents, llm, lcm):
    # Elevate query
    elevated_query = llm.invoke(
        ELEVATOR_PROMPT.format(
            user_request=user_request,
            document_list=[d.title for d in documents]
        )
    )

    # LCM reasoning
    lcm_output = lcm.synthesize(elevated_query, documents)

    # Polish response
    conversational_response = llm.invoke(
        POLISHER_PROMPT.format(
            user_request=user_request,
            lcm_output=lcm_output
        )
    )

    return conversational_response

Integration contract. The LLM-LCM boundary is the synthesis task specification (elevator output) and the synthesized analysis (LCM output). Both should be structured: the elevator output specifies what the LCM must produce; the LCM output is decoded natural language that the polisher can work with.

Failure modes.

Elevator distortion: The elevator LLM transforms the user's query in a way that changes its meaning before it reaches the LCM. The LCM answers a different question than the user asked. Mitigation: include the original user request alongside the elevated query in the LCM's context; validate elevator output against the original intent before passing to LCM.

Polisher hallucination: The polisher adds content not in the LCM's output, in service of making the response more conversational. Mitigation: instruct the polisher explicitly not to add information absent from the LCM output; include a citation validation step that confirms every claim traces to the LCM output.

Decision tree entry point. Use the concept elevator when your primary application is conversational, concept-level reasoning is a minority of requests, and user experience continuity across LLM and LCM tasks matters.

12.4 Pattern 3: The Concept Pipeline

Hybrid Pattern 3 — Concept Pipeline

Figure 12.3 — The Concept Pipeline. LCM batch-processes the document corpus offline, producing a structured analysis report. The LLM serves fast, interactive Q&A against the pre-reasoned output. LCM cost is amortized across all subsequent queries.

When to use it. A regulatory compliance platform uses the LCM to perform cross-jurisdiction contradiction detection (Chapter 9) and feeds the structured results to an LLM that answers analyst questions about specific contradictions. The LCM runs as a batch process; the LLM provides the interactive Q&A layer.

Architecture:

[Document Corpus] → [LCM — Batch Analysis] → [Structured Analysis Report]
                                                        ↓
                                               [LLM — Interactive Q&A]
                                                        ↓
                                               [Analyst Questions / Answers]

Integration contract. The LCM produces a structured analysis report organized by finding type (equivalences, contradictions, gaps) with source document citations. This report becomes the LLM's knowledge base for the interactive Q&A session. The LLM does not access the raw document corpus — only the LCM's structured analysis.

Failure modes.

LLM confabulation on LCM output: The analyst asks about a contradiction the LCM identified, and the LLM elaborates with information not in the LCM's analysis. Mitigation: instruct the LLM to answer only from the provided analysis; require source citations for every claim.

Stale LCM analysis: Documents change after the LCM batch run; the LLM answers questions based on outdated output. Mitigation: timestamp the LCM analysis; warn users when the analysis exceeds a configurable age; trigger re-analysis when documents change.

Decision tree entry point. Use the concept pipeline when concept-level reasoning is a batch preprocessing step, the LCM's output is a structured artifact that humans and downstream systems consume, and interactive access to the LCM's findings is needed afterward.

12.5 Pattern Selection Decision Tree

Does the application serve mixed (token-level + concept-level) requests
from a single entry point?
    YES → Is the entry point conversational?
              YES → Concept Elevator
              NO  → Concept Router
    NO  → Is concept-level reasoning a batch preprocessing step?
              YES → Concept Pipeline
              NO  → Is concept-level reasoning the primary function?
                        YES → Pure LCM (no LLM wrapper needed)
                        NO  → Reconsider whether LCMs are needed

12.6 Operational Overhead of Hybrid Architectures

Hybrid architectures are more complex to build, monitor, and debug than single-architecture systems. Three costs to account for before committing to a hybrid design:

Development overhead. Each pattern requires building and maintaining two inference pipelines, the boundary contracts between them, and the orchestration logic that moves data across the boundary. Roughly 2–3x the development effort of a single-architecture system.

Observability complexity. A token-level trace from an LLM captures the full reasoning chain in readable text. A hybrid system trace must capture both the LLM's token-level reasoning and the LCM's concept-level operations — encoding, similarity scores, generation. These require different instrumentation and cannot be combined in a single trace format.

Error attribution. When a hybrid system produces a wrong answer, determining whether the error originated in the LLM or the LCM component requires replaying the pipeline with intermediate outputs captured at each stage. That replay capability has to be designed in; it does not come for free.

For early LCM adopters, hybrid architectures are the realistic production architecture. Pure LCM systems fit batch workflows where the full task is concept-level. For user-facing applications, the LLM handles everything the LCM handles poorly (conversational wrapping, local fluency, short-form Q&A), and the LCM handles everything the LLM handles poorly (concept-level reasoning, cross-lingual synthesis, global coherence). The boundary between them is where the system earns its complexity.

Exercises

TypeExerciseDescription
DesignHybrid architecture for RFP analysisA procurement team processes 50 vendor RFPs (in English and French) and needs to: (a) answer individual questions about specific vendor proposals (token-level) and (b) compare all vendors against each other on key criteria (concept-level). Design a hybrid architecture that handles both use cases from a single conversational interface. Which pattern(s) apply? Draw the architecture diagram and define the integration contracts.
CodingConcept routerImplement the concept router pattern. Write a classification prompt that correctly routes at least 15 diverse test requests (a mix of token-level and concept-level tasks) to the correct execution path. Measure classification accuracy. What is your error rate, and which task types are hardest to classify correctly?
AnalysisFailure mode auditFor each of the three hybrid patterns, identify one additional failure mode beyond those described in the chapter. For each additional failure mode, describe: how it manifests in production, what monitoring would detect it, and what the mitigation is.