Chapter 26 of 75
AI-Powered Test Case Generation
Generating test cases is one of the highest-ROI applications of LLMs for QA teams. LLMs produce comprehensive test suites faster than manual authoring while surfacing edge cases that human testers miss.
Part III — The Practitioner's Toolkit
AI-Powered Test Case Generation
Test case generation is repetitive, time-consuming, and heavily dependent on the ability to imagine all the ways a system can receive inputs — including the unusual, the malformed, and the boundary-condition inputs that production users eventually provide. LLMs are well-suited to this task: they have seen enormous amounts of software specifications, test suites, and defect reports in their training data, and they generate comprehensive test scenarios faster than any manual process.
26.1 The Test Generation Workflow
Input: A specification to test — a user story, a functional requirement, a business rule, an API specification, or existing code.
Step 1: Analyze the specification. Prompt the LLM to identify the key behaviors, inputs, outputs, and constraints described in the specification. This step surfaces what the LLM understands about the specification — review it for completeness and accuracy before proceeding.
Step 2: Generate happy path cases. Prompt for the standard success scenarios — the cases where all inputs are valid and the expected outcome is achieved.
Step 3: Generate boundary cases. Prompt for cases at the edges of valid inputs — the minimum and maximum values, the empty string and the maximum-length string, the start and end of date ranges. Boundary conditions are where bugs concentrate.
Step 4: Generate negative cases. Prompt for cases where inputs are invalid, missing, or malformed. "What inputs would a user provide that the system should reject? What error condition should each produce?"
Step 5: Generate business rule cases. For each business rule in the specification, generate at least one case that tests the rule with compliant input and one that tests it with non-compliant input.
Step 6: Review and curate. The LLM's output is a comprehensive but unvalidated test suite. Review each test case: Is the expected behavior correctly stated? Is the test case actually testable? Does it duplicate another test case? Would it catch a real defect? Remove redundant and incorrect cases; add domain-specific cases the LLM missed.
26.2 Prompting for Test Cases
The quality of generated test cases depends heavily on the prompt. Effective patterns:
Provide the full specification context. The more precisely the LLM understands the behavior being tested, the more accurate the test cases. Include the relevant user stories, acceptance criteria, and business rules in the prompt.
Specify the test case format. "Generate test cases in Gherkin format (Given/When/Then)" or "Generate test cases as a table with columns: Test ID, Preconditions, Test Steps, Expected Result."
Ask explicitly for negative and edge cases. Generic prompts produce mostly happy-path cases. Explicitly request boundary conditions, negative cases, and error scenarios.
Ask about domain-specific risks. "This is a financial transaction system. What security and fraud-related test cases should be included?"
Iterate on gaps. After reviewing the initial output, prompt for the categories of cases you notice are missing: "You did not include cases for concurrent access. Generate five test cases for concurrency scenarios."
26.3 Test Data Generation
Beyond test cases, LLMs can generate test data — the specific input values that test cases require.
Realistic synthetic data: "Generate 20 realistic customer records with diverse names, addresses, and account statuses for use as test data. Include examples with special characters, edge-case formats, and missing optional fields."
Adversarial inputs: "Generate 10 input strings that would commonly be used in SQL injection attacks, for testing input validation."
Domain-specific data: "Generate 15 realistic insurance claim records with diverse claim types, amounts, and statuses. Include some claims that would trigger automatic fraud review."
Generated test data must be reviewed for realism and must not contain real personal data — all test data should be clearly synthetic.
26.4 Validating Generated Test Cases
LLM-generated test cases require validation before use:
Expected behavior accuracy: Does the expected result in each test case match what the specification says should happen? LLMs occasionally generate plausible but incorrect expected behaviors.
Testability: Can each test case actually be executed? Cases that require system states that cannot be set up in the test environment, or that require external dependencies unavailable in testing, are not useful.
Coverage completeness: Run the generated test suite against a coverage framework — does it cover all the branches, conditions, and business rules in the specification? Gaps indicate categories of cases the LLM did not generate.
Domain validity: Are the test inputs within the valid domain for the system? A financial system test case that uses negative amounts may not be meaningful if the system's API rejects negative amounts at the interface layer before the business logic is reached.
Generated test cases that pass validation can be incorporated into the test suite with confidence. Those that fail are learning data — they tell you where the specification was ambiguous or where the LLM's domain understanding was imprecise.