Chapter 13 of 17

Capstone: The AI Chatbot Launch

A complete worked example: you're PM for a B2B SaaS product and the CEO wants an AI-powered support chatbot shipped in 90 days. Walk through every stage of the framework — with templates and deliverables at each step.

9 min read

Overview

The Scenario

You are the product manager for Fieldwork, a B2B SaaS platform used by 2,400 mid-market operations teams to manage field service scheduling, dispatch, and reporting. Average contract value is $18,000/year. Current NPS is 42. Your support team handles approximately 1,800 tickets per month.

The CEO returns from a conference and sends you this message on a Monday morning:

"Talked to three peers this weekend who've deployed AI chatbots for support. One company cut support tickets by 40%. We need to ship this in 90 days. Let's make it happen."

Your job: figure out whether this is worth doing, how to do it right, and what to deliver at each stage.

Diagram

Stage 1: Signal Capture — Is This Worth Building?

Don't open a project in Jira. Don't schedule an engineering scoping session. The first work is validation.

Signal Capture Questions

Is there evidence of real user need?

Pull your last 90 days of support ticket data. Categorize tickets by:

  • Type (how-to / bug report / billing / data question / other)
  • Resolution path (self-serve possible / required human judgment / required engineering)
  • Volume by category

What you might find for Fieldwork:

CategoryMonthly volumeSelf-serve possible?
How-to questions (scheduling, dispatch)720Yes — documentation exists
Report configuration questions340Yes — complex but documentable
Integration/API questions210Partially — needs human for complex cases
Bug reports310No — requires engineering
Billing/account questions150Partially
Other70Mixed

Analysis: Approximately 1,060 tickets per month (59%) are candidates for AI deflection. If the AI deflects 40% of those, that's approximately 424 fewer tickets per month — meaningful for a team managing 1,800/month.

The "unlimited humans" test: If you had 10 more support reps, what would they do? They would answer the same how-to and configuration questions faster. An AI doing this is not transforming anything — it is automating a cost center. That's legitimate value, but frame it correctly.

Is 90 days realistic?

Run the Integration Complexity Estimation from Chapter 10:

DimensionScoreNotes
Data access complexity3Support tickets exist in Zendesk; product documentation in Confluence
Prompt engineering complexity3FAQ and how-to queries are moderately structured; need good context retrieval
Output integration complexity2Chat widget in app; output is displayed text
Safety and moderation complexity3Wrong answers damage customer trust in a B2B context
Existing system fragility2Chat widget can be added without touching core product

Total: 13 — Medium complexity. 90 days is aggressive but feasible for an MVP. Not feasible for a polished, production-grade chatbot.

Signal Capture recommendation: Proceed, but reframe the CEO's request from "ship in 90 days" to "run a controlled pilot in 90 days with defined success criteria."

Signal Capture Deliverable

A one-page recommendation memo to the CEO:

"We have validated that approximately 60% of our support tickets are candidates for AI deflection. A 90-day timeline is feasible for a controlled pilot of 10–15% of our user base, with full rollout dependent on pilot results. I recommend we proceed with a pilot rather than a full launch, with the following success criteria: [see kill criteria below]. Here is my proposed plan."

Stage 2: Value Hypothesis and Kill Criteria

Value Hypothesis

We believe that adding an AI-powered support chatbot to Fieldwork

Will help operations managers who encounter setup or how-to questions during their workflow

By giving them instant, accurate answers without waiting for a support ticket response

Resulting in a 30%+ reduction in Tier 1 support ticket volume, with user satisfaction scores on chatbot interactions above 3.5/5.0

We will know this is true when the chatbot resolves 30%+ of tickets without human escalation, and users who used the chatbot have equivalent or better support NPS compared to ticket-based support.

Kill Criteria

Define these before you start, write them down, and share them with the CEO.

CriterionKill thresholdMeasurement
Chatbot answer accuracy<70% of answers rated as "helpful" by usersUser thumbs-up rate
Human escalation rate>60% of chatbot interactions escalate to humanEscalation rate during pilot
User satisfactionChatbot support NPS > 10 points below ticket NPSPost-interaction survey
Harmful / wrong answers>2 instances of confidently wrong answers in pilot causing customer harmManual review + escalations
Cost per deflected ticket>$8 per deflected ticket (vs. ~$12 for human ticket)Cost / deflected ticket count

If any kill criterion is met at the 45-day pilot midpoint review, the pilot is paused and the team convenes a kill/pivot decision. No exceptions. The CEO agreed to these criteria at the start; hold the line.

Stage 3: Grounded Delivery — Phased Plan

0

Phase

Foundation (Weeks 1–3)

Goal: Get the infrastructure right before showing anything to users.

Work:

  • Export and structure all support documentation (KB articles, FAQs, common responses) for RAG ingestion
  • Set up shadow mode: chatbot processes all new support tickets in the background; team reviews outputs but nothing is shown to users
  • Build evaluation dataset: 200 representative support queries with human-written ideal answers
  • Define chatbot persona and escalation rules

Deliverable: Shadow mode live; evaluation baseline established.

Shadow mode targets: Achieve >70% match rate between chatbot responses and human-ideal responses before proceeding to Phase 1.

1

Phase

Internal Canary (Week 4)

Goal: Validate with zero customer risk.

Work:

  • Enable chatbot for internal Fieldwork employees only
  • Collect thumbs-up/down, corrections, and escalations
  • Manual review of 100% of responses during this phase
  • Iterate on prompt engineering and knowledge base gaps

Deliverable: Internal quality report; decision to proceed to customer pilot.

Gate criteria: Internal user satisfaction >4.0/5.0; error rate <5%.

2

Phase

Limited Customer Pilot (Weeks 5–10)

Goal: Validate with real users in a controlled environment.

Work:

  • Enable chatbot for 10–15% of customer base (opt-in during onboarding or via prominent in-app prompt — NOT default-on at this stage)
  • Measure against all kill criteria weekly
  • Human review of escalated conversations daily
  • 45-day midpoint review with go/kill/pivot decision

Deliverable: Pilot results report with recommendation.

Who to include in pilot: Power users who are comfortable with beta features; accounts with dedicated CS managers who can provide qualitative feedback. Avoid your 10 largest accounts until Phase 3.

3

Phase

Graduated Rollout (Weeks 11–14, if pilot succeeds)

Goal: Expand safely with quality maintained.

Work:

  • 25% → 50% → 100% rollout over three weeks
  • Default-on for new users; opt-in for existing users for first 30 days
  • Continue monitoring all quality metrics
  • Update support team on volume trends; retrain on escalation patterns

Deliverable: Full production deployment; operational handoff to support team.

Stage 4: Evaluation Framework

Evaluation Dimensions

Tier 1: Accuracy

Run your 200-query evaluation set through the chatbot every time you update the knowledge base or prompt. Track:

  • Match rate against human-ideal answers (automated semantic similarity)
  • Factual error rate (human review of random sample)
  • Hallucination rate (answers that assert things not in the knowledge base)

Tier 2: User Satisfaction

After each chatbot interaction, show a brief survey:

  • "Did this help you solve your issue?" (Yes / Partially / No)
  • "How satisfied are you with this response?" (1–5)
  • [If "No" or <3]: "What was missing or wrong?" (free text)

Track these by query category, user segment, and over time.

Tier 3: Business Impact

  • Weekly ticket volume trend in pilot vs. control group
  • Average time to resolution (chatbot vs. ticket)
  • Support team capacity freed (tickets saved × avg. handle time)
  • Support NPS trend in pilot vs. control group

Tier 4: Cost

  • Cost per chatbot interaction (all-in: tokens, retrieval, infrastructure)
  • Cost per deflected ticket
  • Monthly AI cost vs. monthly support cost savings

Evaluation Cadence

FrequencyWhat's reviewedWho reviews
DailyEscalations and flagged responsesPM + Support lead
WeeklyFull metrics dashboard vs. kill criteriaPM
Bi-weeklyEvaluation suite run; quality trendPM + AI/ML team
45 daysFormal kill/proceed reviewPM + CEO + Support VP
End of pilotFull recommendation memoPM

Stage 5: Rollout and Monitoring Plan

Rollout Communication

Internal: Brief the support team before the pilot launches. Frame the chatbot as a tool that handles routine volume so support can focus on complex, relationship-driven issues. Provide training on the escalation workflow. Establish a direct feedback channel for support reps to flag chatbot errors.

Customer-facing: "We're piloting an AI support assistant to help you get faster answers. If it doesn't solve your issue, it will connect you to our team in one click. [Enable for my account / Keep standard support]."

Don't call it "AI-powered" in customer-facing language without first confirming your customer base reacts positively to AI branding. Some B2B operations buyers are skeptical. "Intelligent assistant" or "automated help" may be preferable framing depending on research.

Monitoring Setup

Before Phase 2 launches, the following must be live:

MetricAlert thresholdOwner
Escalation rate>55% in any 24h periodPM
Thumbs-down rate>30% in any 24h periodPM
API error rate>2%Engineering
p95 latency>5 secondsEngineering
Daily AI cost>150% of forecastPM + Finance

Template Deliverables at Each Stage

StageDeliverableAudience
Signal Capture1-page recommendation memoCEO
Value HypothesisValue hypothesis doc + kill criteriaCEO + Support VP + Engineering lead
Phase 0Shadow mode quality reportPM + Engineering
Phase 1Internal pilot report with go/no-goPM + Support lead
Phase 2 midpoint45-day pilot review with kill/proceed recommendationCEO + leadership team
Phase 2 finalFull pilot results report with rollout recommendationCEO + leadership team
Phase 3Production launch communicationAll customers
OngoingMonthly AI feature health reportPM + leadership

Common Failure Modes to Avoid

Shipping without shadow mode: Teams skip shadow mode because 90 days is short. Don't. Shadow mode in weeks 1–3 is what makes the rest of the plan safe. If shadow mode reveals that the chatbot is wrong 40% of the time, you've saved yourself from a customer trust crisis.

Letting the CEO's timeline override the kill criteria: The most important decision in this capstone is maintaining discipline at the 45-day review. If the metrics don't support proceeding, say so. A failed chatbot in production is more damaging to the CEO's goals than a delayed launch.

Ignoring the support team: The support team is a critical stakeholder and a crucial source of quality feedback. Loop them in early, treat them as partners, and make sure their workflow is respected during the transition.

Optimizing for deflection rate instead of resolution quality: A chatbot that deflects 50% of tickets by giving vague non-answers that frustrate users has made your product worse. The success metric is resolved issues, not deflected tickets. Define the distinction clearly and track both.

Never closing the feedback loop: If users flag errors and nothing changes, they stop flagging errors. Demonstrate that feedback leads to improvements. Announce when you've fixed something a user reported. This converts skeptics into advocates.