Chapter 03 of 18

Chapter 3: Prompt Engineering Fundamentals

The difference between a mediocre LLM output and a brilliant one almost never lies in the model — it lies in the prompt. Here is the prompt engineering foundation that builds directly on skills analysts already have.

14 min read

Part 1 — Foundations

Chapter 3: Prompt Engineering Fundamentals

The difference between a mediocre LLM output and a brilliant one almost never lies in the model. It lies in the prompt. Prompt engineering builds directly on skills you already have: clear communication, structured thinking, and precise specification of requirements.

Reading time: ~20 min Project: Prompt Library Builder

3.1 The Anatomy of a Good Prompt

A well-constructed prompt has a clear structure, just like a well-written requirement. The components do not always need to appear in the same order, and not every prompt needs every component. But knowing the full anatomy lets you diagnose why a prompt is not working and fix it systematically.

Component	Purpose	Example	When to Use
Role / Persona	Sets the expertise and perspective	"You are a senior QA engineer with 10 years of experience in financial systems"	Almost always — it calibrates the model's response style and depth
Context	Provides background information the model needs	"We are building a patient portal for a hospital system. The system must be HIPAA-compliant."	Whenever the task requires domain or project knowledge
Task / Instruction	Specifies exactly what you want the model to do	"Analyze the following user story and identify missing acceptance criteria"	Always — this is the core of every prompt
Input Data	The material to be processed	The actual user story, document excerpt, or data to analyze	Whenever you are asking the model to process specific content
Output Specification	Defines the format, length, and structure of the response	"Respond in a numbered list with no more than 10 items. Each item should include a severity rating."	Whenever you need structured or consistently formatted output
Constraints / Rules	Boundaries and requirements the output must respect	"Do not suggest changes to the database schema. Focus only on UI/UX improvements."	Whenever you need to exclude certain types of responses

Diagram 1

Anatomy of a Good Prompt — six components that transform a vague request into a precise instruction. The Role sets expertise, Context provides background, the Task defines what to do, Input Data is the material to process, Output Specification controls format, and Constraints set boundaries.

The difference between a weak prompt and a strong one:

Weak prompt: "Write test cases for a login page." Too vague, produces generic results.

Strong prompt (using all six components):

Role: You are a senior QA analyst specializing in web application security testing.

Context: We are testing a login page for an online banking application. The login
supports email/password authentication, Google SSO, and biometric login on mobile.
The application must comply with OWASP Top 10 security standards. The system locks
accounts after 5 failed attempts.

Task: Generate a comprehensive set of test cases for the login page.

Output Format: Present each test case as a table row with these columns:
- Test Case ID (TC-LOGIN-001 format)
- Category (Functional / Security / Usability / Performance / Edge Case)
- Description (one sentence)
- Preconditions
- Test Steps (numbered)
- Expected Result
- Priority (Critical / High / Medium / Low)

Constraints:
- Include at least 3 security-specific test cases aligned with OWASP
- Include at least 2 test cases for each authentication method
- Include edge cases for the account lockout mechanism
- Do not include test cases for registration or password reset (those are separate)

The specificity principle: vague prompts produce generic outputs. Specific prompts produce targeted, actionable outputs. As a BA, you already know this — it is the same principle that makes "the system shall be fast" a bad requirement and "the system shall return search results within 200ms under a load of 1,000 concurrent users" a good one.

3.2 Zero-Shot vs Few-Shot

One of the most powerful techniques in prompt engineering is providing examples of the desired output.

Zero-shot prompting: You give the model instructions but no examples. The model must figure out what you want from the description alone.

Few-shot prompting: You provide one or more examples of inputs paired with desired outputs. The model learns the pattern from your examples and applies it to new inputs.

Diagram 2

Zero-Shot vs Few-Shot Decision Tree — start with the simplest approach and escalate only when output quality demands it. Chain-of-thought is best for complex analytical tasks where you need auditable reasoning.

Use zero-shot when: the task is straightforward and well-defined; you are using a capable model (GPT-4o, Claude Sonnet); standard output formats are acceptable; or you want to minimize prompt length and cost.

Use few-shot when: you need output in a very specific format or style; the task involves domain-specific conventions; you want consistency across multiple runs; or the model misunderstands your zero-shot instructions.

A practical example of few-shot prompting for a BA task — converting informal requirements into structured user stories:

Task: Convert informal requirements into structured user stories with
acceptance criteria.

Example 1:
Input: "Users need to be able to export their data"
Output:
  User Story: As a registered user, I want to export my account data
  in a standard format so that I can maintain a personal backup or
  migrate to another service.

  Acceptance Criteria:
  - Given a logged-in user on the account settings page
    When they click "Export My Data"
    Then the system generates a ZIP file containing their data in JSON format
  - Given an export is requested
    When the data exceeds 100MB
    Then the system emails a download link instead of direct download
  - Given a user has requested an export
    When they request another within 24 hours
    Then the system shows a message: "Export already in progress"

Example 2:
Input: "The dashboard needs to load faster"
Output:
  User Story: As a daily dashboard user, I want the main dashboard to
  load within 2 seconds so that I can begin my morning workflow without delay.

  Acceptance Criteria:
  - Given a user navigates to the dashboard
    When the page loads on a broadband connection (>10 Mbps)
    Then all above-the-fold content renders within 2 seconds
  - Given a user accesses the dashboard on a mobile device
    When the connection speed is 3G or better
    Then the critical metrics display within 4 seconds
  - Given the dashboard has more than 10 widgets configured
    When the page loads
    Then widgets load progressively with skeleton screens shown for pending widgets

Now convert this requirement:
Input: "Managers want to see who's doing what"

The examples encode implicit standards: the level of specificity expected, the tone, the format of acceptance criteria, and the practice of adding quantifiable metrics. The model mirrors these patterns without you needing to explain each convention explicitly.

Quality of examples matters more than quantity. Two excellent examples typically outperform five mediocre ones. Choose examples that demonstrate the nuances of what you want: proper specificity levels, edge case handling, appropriate domain language. Your examples are a specification — they define the contract the model will follow.

3.3 Role-Based Prompting

Role-based prompting instructs the LLM to adopt a specific persona, expertise level, or perspective. This meaningfully affects the quality, depth, and style of outputs because it activates different patterns in the model's learned representations.

Consider how different roles produce different analyses of the same requirement.

Given "The system shall support up to 10,000 concurrent users": a BA role prompt focuses on completeness (Who are these users? What actions? Is 10,000 based on current or projected usage?). A QA Performance role focuses on testability (How do we define "concurrent"? What are pass/fail criteria? What load tool?). A Security role focuses on threats (What about DDoS above 10,000? Is there rate limiting?). Same requirement, three different and valuable analyses — all driven by the role you set.

Effective role specifications include three elements:

Title and seniority: "Senior Business Analyst" vs "Junior BA" vs "VP of Product" — each implies different depth and perspective.
Domain expertise: "specializing in healthcare systems" or "with 10 years in fintech" — narrows the domain lens.
Behavioral instructions: "You are thorough and always identify at least 3 risks" or "You write concisely and avoid jargon" — shapes the communication style.

Proven role templates for common analyst scenarios:

Scenario	Effective Role Prompt
Requirements review	"You are a Senior BA with a reputation for finding gaps that other analysts miss. You are constructively critical and always substantiate concerns with specific questions."
Test case design	"You are a QA Engineer who has found critical production bugs that saved millions. You think in edge cases and boundary conditions. You never assume the happy path will work."
Stakeholder communication	"You are a BA who excels at translating technical concepts for executive audiences. You use analogies, avoid acronyms, and focus on business impact."
Process documentation	"You are a process analyst who creates documentation that new team members can follow without additional training. You use numbered steps, include decision points, and add notes for common mistakes."
Devil's advocate	"You are a seasoned analyst who has seen many projects fail. Your job is to challenge assumptions, identify risks, and ask the uncomfortable questions that need asking."

Avoid unrealistic roles. Asking the model to be "the world's best analyst who never makes mistakes" does not improve output — it can make the model more likely to give overconfident responses. Ground your role prompts in realistic expertise descriptions. The model performs best when the role is specific and authentic.

3.4 Chain-of-Thought for Analysis

Chain-of-thought (CoT) prompting instructs the model to show its reasoning process step by step rather than jumping directly to an answer. For analyst tasks, this technique is valuable because it makes the model's reasoning transparent and auditable.

The basic mechanism: include a phrase like "Think through this step by step" or "Show your reasoning process" in your prompt. But structured chain-of-thought is far more powerful than the generic version.

Structured CoT for requirements analysis:

"Analyze this requirement using the following framework:

Step 1 — COMPREHENSION: Restate the requirement in your own words
to confirm understanding.

Step 2 — COMPLETENESS: Identify any missing elements using the
INVEST criteria (Independent, Negotiable, Valuable, Estimable,
Small, Testable).

Step 3 — AMBIGUITY: Highlight any terms or phrases that could be
interpreted in multiple ways. For each, suggest a clarifying question.

Step 4 — TESTABILITY: Assess whether the requirement as written
can be definitively tested. If not, suggest measurable criteria.

Step 5 — DEPENDENCIES: Identify any implicit dependencies on other
requirements, systems, or decisions.

Step 6 — RECOMMENDATION: Provide a rewritten version that addresses
the issues found in steps 2-5."

The same approach works for QA tasks like defect root cause analysis. Define steps: (1) SYMPTOM — restate what the user observed, (2) REPRODUCTION — outline likely steps to reproduce, (3) HYPOTHESES — generate 3+ possible root causes ranked by likelihood, (4) INVESTIGATION PLAN — recommend specific logs or tests to check, (5) IMPACT ASSESSMENT — what else might be affected? This framework produces far more thorough analysis than simply asking "What caused this bug?"

CoT is most valuable for complex analytical tasks where you need to trust the model's reasoning: requirements analysis, defect investigation, impact assessment, risk evaluation. For simple generative tasks (drafting an email, formatting data), CoT adds token cost without much benefit. Use it when the reasoning matters as much as the conclusion.

3.5 Output Formatting Techniques

Controlling the format of LLM output is critical for analyst work. You need outputs that can be directly incorporated into deliverables, imported into tools, or consistently compared across multiple runs. LLMs follow formatting instructions well when those instructions are explicit.

Technique 1: Structured Formats with Delimiters

Format your response using the following structure:

## Summary
[2-3 sentence overview]

## User Stories
For each story, use this format:
---
**ID:** US-[NNN]
**Story:** As a [role], I want [feature], so that [benefit]
**Priority:** [Critical | High | Medium | Low]
**Acceptance Criteria:**
- Given [context] When [action] Then [result]
---

## Risks
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| ...  | ...       | ...    | ...        |

Technique 2: JSON Output for Tool Integration. When you need machine-readable output — for importing into spreadsheets, dashboards, or other tools — tell the model to return JSON with a specific schema you define. Include the exact field names and data types, and add "Return ONLY valid JSON, no additional text." Most APIs also offer a dedicated JSON mode that guarantees valid structure.

Technique 3: Markdown Tables for Reports. Specify the exact column headers you want. This produces clean, consistent tabular data you can paste into Confluence, Notion, or any Markdown-compatible tool.

Technique 4: Constrained Length. Specify exact limits: "Respond in exactly 5 bullet points, each no longer than 2 sentences. Focus on actionable insights, not observations." This prevents verbose output and forces the model to prioritize.

Always include a negative constraint — tell the model what NOT to include:

"Do not include introductory or concluding remarks"
"Do not explain your reasoning. Output only the final result."
"Do not add fields beyond the specified schema"
"Do not wrap the JSON in markdown code fences"

When building automated workflows (Chapter 4), always use the API's JSON mode rather than relying on prompt instructions alone. OpenAI's API supports response_format={"type": "json_object"}, and Anthropic's Claude supports similar structured output controls.

3.6 Prompt Templates and Libraries

Once you have developed effective prompts for recurring tasks, systematize them into reusable templates. A prompt template is a prompt with placeholders for variable inputs, allowing you to reuse the same carefully crafted structure across different projects and contexts.

A prompt template is simply a saved prompt with placeholders (like {domain}, {requirement}, {project_context}) that you fill in each time you use it. Store templates in a shared document, spreadsheet, or text file — one per task type. The key is a consistent structure so every analyst on your team produces the same quality of output.

Organizing Your Prompt Library

Category	Templates	Use Frequency
Requirements	User Story Generator, BRD Section Drafter, Gap Analysis, Stakeholder Impact	Daily
Quality Assurance	Test Case Generator, Defect Report Enhancer, Test Data Creator, Regression Analyzer	Daily
Analysis	Requirements Review, Ambiguity Detector, INVEST Validator, Dependency Mapper	Weekly
Communication	Status Report Generator, Meeting Notes Summarizer, Stakeholder Email Drafter	Daily
Documentation	Process Flow Describer, Data Dictionary Builder, API Documentation Generator	Weekly

Treat your prompt library like code. Store it in version control, document changes, and test new versions before deploying them. A prompt that works well with GPT-4o may need adjustment for Claude or Gemini. Track which model and version each template was tested against.

3.7 Common Prompt Pitfalls

Pitfall 1: The Vague Instruction

Bad	Good	Why
"Analyze this requirement"	"Evaluate this requirement against INVEST criteria and identify specific gaps in testability and completeness"	Specifies the framework and focus areas

Pitfall 2: Missing Context

Bad	Good	Why
"Write test cases for the payment feature"	"Write test cases for the payment feature of our B2B SaaS invoicing system. Supports credit card (Stripe), ACH, and wire transfer. Users are finance managers at mid-size companies."	Domain context dramatically changes what test cases are relevant

Pitfall 3: Conflicting Instructions

Bad	Good	Why
"Be concise. Provide a comprehensive, detailed analysis covering all aspects."	"Provide a focused analysis of the three highest-risk areas. For each, include 2-3 sentences of explanation and a specific recommendation."	Resolves the concise-vs-comprehensive conflict with specific expectations

Pitfall 4: Asking for Confirmation Instead of Critique

Bad	Good	Why
"Is this a good requirement? 'The system shall be user-friendly.'"	"Identify every problem with this requirement and suggest a specific improvement for each: 'The system shall be user-friendly.'"	The first phrasing invites sycophancy; the second demands critical analysis

Pitfall 5: Prompt Overload. Asking the model to do too many things in a single prompt produces mediocre results across the board. A prompt that says "analyze the requirements, generate test cases, write the test plan, create test data, and estimate the testing effort" will produce mediocre results for all five tasks. Run five focused prompts and get excellent results for each.

Pitfall 6: Example Contamination. In few-shot prompting, if all your examples share a characteristic that is coincidental rather than desired (e.g., all examples are about user authentication), the model may incorrectly assume that characteristic is part of the pattern. Ensure your examples are diverse enough to show the general pattern, not a narrow slice.

Pitfall 7: Ignoring the System Message. When using APIs, the system message is your most powerful formatting tool. It sets persistent context that does not need to be repeated. Many analysts put everything in the user message, leading to verbose, repetitive prompts. Use the system message for role, constraints, and output format. Use the user message for the specific task and input data.

When a prompt produces poor results, resist the urge to add more instructions. Diagnose which component is failing. Wrong format? Fix the output specification. Domain-inappropriate content? Fix the context. Shallow analysis? Add chain-of-thought steps. Targeted fixes outperform prompt bloat every time.

Project: Prompt Library Builder

Build a personal prompt library with at least 5 tested templates you can use in your daily work. The library should be structured, version-controlled, and ready for team sharing.

Step 1: Identify Your Top 5 Tasks. From your LLM Impact Assessment (Chapter 1), select the 5 analyst tasks you perform most frequently that involve text generation or analysis.

Step 2: Build Templates. For each task, create a template with role, context placeholders, specific instructions, output format, and constraints.

Step 3: Test and Iterate. Run each template with real data from your work. Score the output on a 1-5 scale for quality, format compliance, and usefulness. Iterate on templates that score below 4.

How to build your library: Create a shared document (Google Doc, Confluence page, or spreadsheet) with one row per template. For each, record: Template Name, Category (Requirements, QA, Communication, etc.), The Prompt (with for variable inputs), Recommended Model, and Test Score (1-5, updated after each use). Example entry:

Template: Meeting Notes to Action Items
Category: Communication

Prompt:
You are a Senior Business Analyst.
Convert the following meeting notes into structured action items.

Meeting Notes: {meeting_notes}

Output Format:
## Action Items
| # | Action | Owner | Due Date | Priority |

## Decisions Made  |  ## Open Questions  |  ## Next Meeting Agenda

Rules:
- Infer owners from context when mentioned by name
- Flag any action without a clear owner as "TBD"
- Set priority based on business impact discussed

Deliverable: A document containing your prompt library with at least 5 templates, each tested at least once with a score of 4 or higher. Share it with your team and invite them to contribute their own.

Exercises

Conceptual. A fellow analyst shows you their prompt: "Write a good BRD for a mobile banking app." Identify at least 5 specific improvements you would make, referencing the components from Section 3.1. Explain why each improvement matters.

Coding. Create a Python function called prompt_quality_checker that takes a prompt string as input and evaluates it against the six components from Section 3.1. The function should return a score (0-100) and specific recommendations for improvement. Test it against 3 prompts of varying quality.

Design. Design a few-shot prompt template for converting JIRA defect tickets into structured root cause analysis reports. Include 2 realistic examples with different defect categories (functional bug, performance issue). The output should include: root cause hypothesis, affected components, recommended fix, and regression test suggestions.

← Back to AI for Analysts and QA Teams — Revised