Prompt Engineering for QA: Write Better Tests With AI
Here's what I see all the time: QA engineers spend an hour writing a perfect prompt, run it through Claude, and get back tests that don't work. The tests run fine. They're syntactically correct. But they test the wrong thing, or they miss the actual failure case.
The problem isn't AI. It's that most people don't know how to prompt for test code.
I've spent the last 6 months reverse-engineering what works and what doesn't. I've written hundreds of prompts. I've generated thousands of test cases. And I've learned that garbage prompts produce garbage tests—but good prompts produce tests better than most humans write.
This guide shows you exactly how to prompt AI so it writes tests that actually work.
Why Most AI-Generated Tests Fail (And It's Not AI's Fault)
You write a prompt like this:
Generate tests for a login form.
Claude generates tests. They look good. But when you run them, they fail because:
- No selectors provided — Claude guesses at CSS selectors. Wrong guess = broken tests.
- No context about your app — Claude doesn't know if your login is at
/loginor/auth/loginor if it's a modal. - No failure scenarios — Claude tests the happy path. It doesn't test rate limiting, session expiry, or SQL injection attempts.
- No assertion logic — Claude asserts on generic text ("Invalid credentials") without knowing your app's exact error messages.
The real issue: You're not giving Claude enough information to generate good tests.
Think of it like asking a contractor to build a house without blueprints. They'll build something. It might have walls. But it probably won't match what you wanted.
Good prompting is the blueprint.
The 5 Laws of Effective QA Prompting
Law 1: Provide Your Actual UI Structure
Bad:
Generate tests for a registration form.
Good:
Generate tests for a registration form with these selectors:
- Email input: input[data-testid="email"]
- Password input: input[data-testid="password"]
- Confirm password: input[data-testid="confirm-password"]
- Sign up button: button[type="submit"]
- Error message: [role="alert"]
- Success message: .success-banner
Why: Claude doesn't guess selectors. It uses exactly what you provide. If you don't provide selectors, your tests will be broken on day one.
Action: Before prompting, use your browser's Inspector to find actual data-testid attributes. Copy them into your prompt.
Law 2: Show Claude Your API Responses
Bad:
Test the API that fetches user data.
Good:
Test the API at /api/users/:id that returns:
{
"id": 123,
"name": "John Doe",
"email": "john@example.com",
"createdAt": "2026-03-20T10:00:00Z",
"isActive": true
}
Test that:
- Response status is 200
- Response contains all required fields
- Dates are in ISO 8601 format
- Non-existent user returns 404
Why: Claude can't test what it doesn't understand. Show the structure. Be specific about assertions.
Law 3: Define Success and Failure Explicitly
Bad:
Test form validation.
Good:
Test form validation. The form should:
- Accept emails matching pattern: /^[^\s@]+@[^\s@]+\.[^\s@]+$/
- Show error "Invalid email format" for bad emails
- Show error "Email already exists" for duplicate emails
- Allow submit only when ALL fields are valid
- Clear error message when user corrects input
Why: Claude writes tests for what you explicitly ask for. Vague requirements = vague tests.
Law 4: Specify the Technology Stack
Bad:
Generate tests.
Good:
Generate Playwright TypeScript tests using the Page Object Model pattern.
Use async/await, not promises.
Use data-testid selectors only (not CSS class selectors).
Each test should be independent and can run in any order.
Why: Claude will adapt to your constraints. Without them, it might generate tests using Selenium, Cypress, or mixing patterns.
Law 5: Ask for Edge Cases Explicitly
Bad:
Test payment processing.
Good:
Test payment processing. Include tests for:
- Valid payment (happy path)
- Expired card (should show "Card expired" error)
- Insufficient funds (should show "Declined" error)
- Empty card number (form validation, should not submit)
- Special characters in cardholder name (should be rejected)
- Duplicate transactions within 60 seconds (should reject second)
Why: Claude doesn't automatically think of edge cases. Tell it what to test.
Bad Prompt vs Good Prompt: Real Examples
Example 1: Login Feature
❌ Bad Prompt:
Generate tests for login.
Claude might generate:
- Tests that don't match your selectors
- Only happy path (valid credentials)
- Generic assertions ("login successful")
- Tests that are flaky due to missing waits
✅ Good Prompt:
Generate Playwright TypeScript tests for a login form at /login.
Selectors:
- Email input: input[data-testid="email"]
- Password input: input[data-testid="password"]
- Login button: button[data-testid="submit"]
- Error message: [role="alert"].error
- Success: redirect to /dashboard
Test scenarios:
1. Valid login (user@test.com / Test123!) redirects to /dashboard
2. Invalid password shows "Invalid credentials" error
3. Non-existent email shows "User not found" error
4. Empty email shows "Email required" error
5. Empty password shows "Password required" error
6. After 5 failed attempts, show "Account locked for 15 minutes"
Use data-testid selectors only.
Each test must be independent.
Use async/await with proper waits (await expect() not setTimeout).
Claude will generate:
- Tests with correct selectors
- All 6 scenarios covered
- Proper assertions for each case
- Tests that won't flake
The difference: 10 lines of good context = tests that actually work.
Example 2: API Response Validation
❌ Bad Prompt:
Test the products API.
✅ Good Prompt:
Generate Playwright tests for the /api/products endpoint.
API response structure:
{
"products": [
{
"id": 1,
"name": "Laptop",
"price": 999.99,
"inStock": true,
"category": "electronics"
}
],
"total": 100,
"page": 1,
"perPage": 10
}
Test cases:
1. GET /api/products returns 200 with correct structure
2. Response includes pagination (total, page, perPage)
3. Price field is always a number >= 0
4. Non-existent category returns empty array
5. Invalid page number returns 400 error
6. Missing auth header returns 401 error
All tests should verify exact response structure and data types.
Claude will generate:
- Tests that validate structure
- Tests for error cases
- Tests for data types
- Tests for edge cases
A Practical Prompt Engineering Workflow for QA
Here's the exact workflow I use on every project:
Step 1: Gather Information (5 min)
- Open your app in Inspector
- Find the actual selectors (data-testid, role attributes)
- Understand the API responses (use Network tab)
- List all error messages users might see
Step 2: Write Context (10 min)
[App overview]
I'm testing a SaaS product called [Name].
Frontend: React
Testing tool: Playwright + TypeScript
URL: [baseurl]
[Feature to test]
Feature: User authentication
Location: /login page
Dependency: POST /api/auth/login
[UI structure]
[Paste actual selectors]
[API responses]
[Paste example response]
[Success criteria]
[Define what success looks like]
[Edge cases]
[List what can go wrong]
Step 3: Create the Prompt
Paste your context into Claude with:
Generate Playwright tests based on the above context.
Requirements:
- Use TypeScript
- Use async/await
- Use data-testid selectors only
- Each test independent
- Cover happy path + all edge cases
Step 4: Review Generated Tests (10 min)
- Do selectors match your app?
- Do tests cover all scenarios?
- Would these tests catch actual bugs?
- Are assertions specific (not generic)?
Step 5: Run and Iterate (5 min)
npx playwright test
If tests fail:
- Ask Claude: "Why did this test fail? The selector should be..."
- Claude will fix it
- Re-run tests
Total time: 30 minutes for a complete test suite that would take 2-3 hours manually.
Advanced: Chaining Prompts for Complex Test Scenarios
For complex workflows (multi-step processes), chain multiple prompts:
Prompt 1: Generate base Page Object
Create a Playwright Page Object for the checkout flow.
Pages: /cart, /shipping, /payment, /confirmation
For each page, create methods like:
- goto()
- fillAddress(address)
- selectShipping(method)
- fillPayment(card)
- submitOrder()
Prompt 2: Generate end-to-end test
Using the CheckoutPage object from above, generate an E2E test for the complete checkout flow.
Test scenario: User adds items to cart, ships to address, pays with card, and sees confirmation.
Use real test data:
- Address: 123 Main St, Springfield, IL 62701
- Card: 4111111111111111 (test card)
- Item: Product ID 1
Prompt 3: Generate edge cases
Generate tests for checkout error scenarios:
- Invalid address (empty zip code)
- Shipping method unavailable for region
- Card declined (use test card 4000000000000002)
- Order confirmation takes >30 seconds (timeout)
Result: Full, realistic test suite without writing any code yourself.
Common Mistakes QA Engineers Make When Using AI
Mistake 1: Not Reviewing Generated Tests
You generate tests, paste them into your project, run them once, and trust they work.
Fix: Always review. Ask yourself: "Would this test catch a bug in the real code?"
Mistake 2: Assuming Selectors Are Universal
You generate tests for a website, but selectors change with app updates.
Fix: Explicitly tell Claude to use data-testid (stable) instead of CSS classes (fragile).
Mistake 3: Skipping Edge Cases
You prompt for happy path, Claude delivers happy path.
Fix: Always specify edge cases, errors, and boundary conditions in your prompt.
Mistake 4: Not Iterating on Bad Tests
Claude generates a test that doesn't work. You give up on AI-assisted testing.
Fix: Ask Claude why it failed. Provide more context. Re-prompt. Iteration is normal.
Mistake 5: Treating All Tests as Equal
Some tests matter (payment processing, security). Some don't (UI copy).
Fix: Spend prompt engineering effort on critical flows. Let AI do the simple stuff.
Key Takeaways
- Good prompts beat natural language — Be specific about selectors, responses, and scenarios.
- Show Claude your actual code — Paste selectors, API responses, error messages.
- Define success explicitly — Don't assume Claude knows what you want.
- Iterate quickly — If a test doesn't work, ask Claude why and fix it.
- Save time where it matters — Use AI for test generation; keep humans for strategy.
The bottom line: Prompt engineering for tests is learnable. Spend 2 hours mastering these 5 laws and you'll save 100+ hours writing tests.
Frequently Asked Questions
Q: How long does it take to write a good prompt?
A: 10-15 minutes if you have your selectors. That's 10% of the time it takes to write tests manually. You save 80-90% on test writing time.
Q: Can AI write tests for my custom framework?
A: Yes, if you explain the framework. Give Claude examples of your pattern, and it will follow it.
Q: What if Claude's tests are still wrong?
A: Provide more context. Show Claude the actual error message. Tell Claude exactly what selector to use. Iteration is normal.
Q: Should I use Claude, ChatGPT, or Copilot for test generation?
A: Claude and ChatGPT are strongest for test code (due to code context window). Copilot is good for quick inline suggestions. Start with Claude.
Q: What percentage of generated tests need manual fixing?
A: In my experience, 10-20% need selector fixes, 5-10% need assertion fixes. That's still 70-80% time savings.
Q: Can I use AI-generated tests in production?
A: Yes, after review. Run them against intentionally broken code to verify they catch bugs.
Next Steps
This week:
- Pick one feature you want to test
- Gather selectors and API responses
- Write a detailed prompt using the 5 laws
- Run your tests
- Measure time saved (should be 70-80% faster)
Next week:
- Expand to your entire test suite
- Build a prompt library for your team
- Set up CI/CD to run AI-generated tests
You'll spend 30 minutes learning prompt engineering and save 100+ hours writing tests. That's the ROI of mastering this skill in 2026.
Want help setting up AI-assisted test generation for your team? Book a free consultation to discuss your testing strategy.
Tayyab Akmal
AI & QA Automation Engineer
6 years of catching critical bugs in fintech, e-commerce, and SaaS — then building the Playwright and Selenium automation that prevents them from shipping again.