Prompt Engineering for QA: Write Better Tests With AI

Here's what I see all the time: QA engineers spend an hour writing a perfect prompt, run it through Claude, and get back tests that don't work. The tests run fine. They're syntactically correct. But they test the wrong thing, or they miss the actual failure case.

The problem isn't AI. It's that most people don't know how to prompt for test code.

I've spent the last 6 months reverse-engineering what works and what doesn't. I've written hundreds of prompts. I've generated thousands of test cases. And I've learned that garbage prompts produce garbage tests—but good prompts produce tests better than most humans write.

This guide shows you exactly how to prompt AI so it writes tests that actually work.

Why Most AI-Generated Tests Fail (And It's Not AI's Fault)

You write a prompt like this:

Generate tests for a login form.

Claude generates tests. They look good. But when you run them, they fail because:

No selectors provided — Claude guesses at CSS selectors. Wrong guess = broken tests.
No context about your app — Claude doesn't know if your login is at /login or /auth/login or if it's a modal.
No failure scenarios — Claude tests the happy path. It doesn't test rate limiting, session expiry, or SQL injection attempts.
No assertion logic — Claude asserts on generic text ("Invalid credentials") without knowing your app's exact error messages.

The real issue: You're not giving Claude enough information to generate good tests.

Think of it like asking a contractor to build a house without blueprints. They'll build something. It might have walls. But it probably won't match what you wanted.

Good prompting is the blueprint.

The 5 Laws of Effective QA Prompting

Law 1: Provide Your Actual UI Structure

Bad:

Generate tests for a registration form.

Good:

Generate tests for a registration form with these selectors:
- Email input: input[data-testid="email"]
- Password input: input[data-testid="password"]
- Confirm password: input[data-testid="confirm-password"]
- Sign up button: button[type="submit"]
- Error message: [role="alert"]
- Success message: .success-banner

Why: Claude doesn't guess selectors. It uses exactly what you provide. If you don't provide selectors, your tests will be broken on day one.

Action: Before prompting, use your browser's Inspector to find actual data-testid attributes. Copy them into your prompt.

Law 2: Show Claude Your API Responses

Bad:

Test the API that fetches user data.

Good:

Test the API at /api/users/:id that returns:
{
  "id": 123,
  "name": "John Doe",
  "email": "john@example.com",
  "createdAt": "2026-03-20T10:00:00Z",
  "isActive": true
}

Test that:
- Response status is 200
- Response contains all required fields
- Dates are in ISO 8601 format
- Non-existent user returns 404

Why: Claude can't test what it doesn't understand. Show the structure. Be specific about assertions.

Law 3: Define Success and Failure Explicitly

Bad:

Test form validation.

Good:

Test form validation. The form should:
- Accept emails matching pattern: /^[^\s@]+@[^\s@]+\.[^\s@]+$/
- Show error "Invalid email format" for bad emails
- Show error "Email already exists" for duplicate emails
- Allow submit only when ALL fields are valid
- Clear error message when user corrects input

Why: Claude writes tests for what you explicitly ask for. Vague requirements = vague tests.

Law 4: Specify the Technology Stack

Bad:

Generate tests.

Good:

Generate Playwright TypeScript tests using the Page Object Model pattern.
Use async/await, not promises.
Use data-testid selectors only (not CSS class selectors).
Each test should be independent and can run in any order.

Why: Claude will adapt to your constraints. Without them, it might generate tests using Selenium, Cypress, or mixing patterns.

Law 5: Ask for Edge Cases Explicitly

Bad:

Test payment processing.

Good:

Test payment processing. Include tests for:
- Valid payment (happy path)
- Expired card (should show "Card expired" error)
- Insufficient funds (should show "Declined" error)
- Empty card number (form validation, should not submit)
- Special characters in cardholder name (should be rejected)
- Duplicate transactions within 60 seconds (should reject second)

Why: Claude doesn't automatically think of edge cases. Tell it what to test.

Bad Prompt vs Good Prompt: Real Examples

❌ Bad Prompt:

Generate tests for login.

Claude might generate:

Tests that don't match your selectors
Only happy path (valid credentials)
Generic assertions ("login successful")
Tests that are flaky due to missing waits

✅ Good Prompt:

Generate Playwright TypeScript tests for a login form at /login.

Selectors:
- Email input: input[data-testid="email"]
- Password input: input[data-testid="password"]
- Login button: button[data-testid="submit"]
- Error message: [role="alert"].error
- Success: redirect to /dashboard

Test scenarios:
1. Valid login (user@test.com / Test123!) redirects to /dashboard
2. Invalid password shows "Invalid credentials" error
3. Non-existent email shows "User not found" error
4. Empty email shows "Email required" error
5. Empty password shows "Password required" error
6. After 5 failed attempts, show "Account locked for 15 minutes"

Use data-testid selectors only.
Each test must be independent.
Use async/await with proper waits (await expect() not setTimeout).

Claude will generate:

Tests with correct selectors
All 6 scenarios covered
Proper assertions for each case
Tests that won't flake

The difference: 10 lines of good context = tests that actually work.

Example 2: API Response Validation

❌ Bad Prompt:

Test the products API.

✅ Good Prompt:

Generate Playwright tests for the /api/products endpoint.

API response structure:
{
  "products": [
    {
      "id": 1,
      "name": "Laptop",
      "price": 999.99,
      "inStock": true,
      "category": "electronics"
    }
  ],
  "total": 100,
  "page": 1,
  "perPage": 10
}

Test cases:
1. GET /api/products returns 200 with correct structure
2. Response includes pagination (total, page, perPage)
3. Price field is always a number >= 0
4. Non-existent category returns empty array
5. Invalid page number returns 400 error
6. Missing auth header returns 401 error

All tests should verify exact response structure and data types.

Claude will generate:

Tests that validate structure
Tests for error cases
Tests for data types
Tests for edge cases

A Practical Prompt Engineering Workflow for QA

Here's the exact workflow I use on every project:

Step 1: Gather Information (5 min)

Open your app in Inspector
Find the actual selectors (data-testid, role attributes)
Understand the API responses (use Network tab)
List all error messages users might see

Step 2: Write Context (10 min)

[App overview]
I'm testing a SaaS product called [Name].
Frontend: React
Testing tool: Playwright + TypeScript
URL: [baseurl]

[Feature to test]
Feature: User authentication
Location: /login page
Dependency: POST /api/auth/login

[UI structure]
[Paste actual selectors]

[API responses]
[Paste example response]

[Success criteria]
[Define what success looks like]

[Edge cases]
[List what can go wrong]

Step 3: Create the Prompt

Paste your context into Claude with:

Generate Playwright tests based on the above context.
Requirements:
- Use TypeScript
- Use async/await
- Use data-testid selectors only
- Each test independent
- Cover happy path + all edge cases

Step 4: Review Generated Tests (10 min)

Do selectors match your app?
Do tests cover all scenarios?
Would these tests catch actual bugs?
Are assertions specific (not generic)?

Step 5: Run and Iterate (5 min)

npx playwright test

If tests fail:

Ask Claude: "Why did this test fail? The selector should be..."
Claude will fix it
Re-run tests

Total time: 30 minutes for a complete test suite that would take 2-3 hours manually.

Advanced: Chaining Prompts for Complex Test Scenarios

For complex workflows (multi-step processes), chain multiple prompts:

Prompt 1: Generate base Page Object

Create a Playwright Page Object for the checkout flow.
Pages: /cart, /shipping, /payment, /confirmation

For each page, create methods like:
- goto()
- fillAddress(address)
- selectShipping(method)
- fillPayment(card)
- submitOrder()

Prompt 2: Generate end-to-end test

Using the CheckoutPage object from above, generate an E2E test for the complete checkout flow.
Test scenario: User adds items to cart, ships to address, pays with card, and sees confirmation.

Use real test data:
- Address: 123 Main St, Springfield, IL 62701
- Card: 4111111111111111 (test card)
- Item: Product ID 1

Prompt 3: Generate edge cases

Generate tests for checkout error scenarios:
- Invalid address (empty zip code)
- Shipping method unavailable for region
- Card declined (use test card 4000000000000002)
- Order confirmation takes >30 seconds (timeout)

Result: Full, realistic test suite without writing any code yourself.

Common Mistakes QA Engineers Make When Using AI

Mistake 1: Not Reviewing Generated Tests

You generate tests, paste them into your project, run them once, and trust they work.

Fix: Always review. Ask yourself: "Would this test catch a bug in the real code?"

Mistake 2: Assuming Selectors Are Universal

You generate tests for a website, but selectors change with app updates.

Fix: Explicitly tell Claude to use data-testid (stable) instead of CSS classes (fragile).

Mistake 3: Skipping Edge Cases

You prompt for happy path, Claude delivers happy path.

Fix: Always specify edge cases, errors, and boundary conditions in your prompt.

Mistake 4: Not Iterating on Bad Tests

Claude generates a test that doesn't work. You give up on AI-assisted testing.

Fix: Ask Claude why it failed. Provide more context. Re-prompt. Iteration is normal.

Mistake 5: Treating All Tests as Equal

Some tests matter (payment processing, security). Some don't (UI copy).

Fix: Spend prompt engineering effort on critical flows. Let AI do the simple stuff.

Key Takeaways

Good prompts beat natural language — Be specific about selectors, responses, and scenarios.
Show Claude your actual code — Paste selectors, API responses, error messages.
Define success explicitly — Don't assume Claude knows what you want.
Iterate quickly — If a test doesn't work, ask Claude why and fix it.
Save time where it matters — Use AI for test generation; keep humans for strategy.

The bottom line: Prompt engineering for tests is learnable. Spend 2 hours mastering these 5 laws and you'll save 100+ hours writing tests.

Frequently Asked Questions

Q: How long does it take to write a good prompt?

A: 10-15 minutes if you have your selectors. That's 10% of the time it takes to write tests manually. You save 80-90% on test writing time.

Q: Can AI write tests for my custom framework?

A: Yes, if you explain the framework. Give Claude examples of your pattern, and it will follow it.

Q: What if Claude's tests are still wrong?

A: Provide more context. Show Claude the actual error message. Tell Claude exactly what selector to use. Iteration is normal.

Q: Should I use Claude, ChatGPT, or Copilot for test generation?

A: Claude and ChatGPT are strongest for test code (due to code context window). Copilot is good for quick inline suggestions. Start with Claude.

Q: What percentage of generated tests need manual fixing?

A: In my experience, 10-20% need selector fixes, 5-10% need assertion fixes. That's still 70-80% time savings.

Q: Can I use AI-generated tests in production?

A: Yes, after review. Run them against intentionally broken code to verify they catch bugs.

Next Steps

This week:

Pick one feature you want to test
Gather selectors and API responses
Write a detailed prompt using the 5 laws
Run your tests
Measure time saved (should be 70-80% faster)

Next week:

Expand to your entire test suite
Build a prompt library for your team
Set up CI/CD to run AI-generated tests

You'll spend 30 minutes learning prompt engineering and save 100+ hours writing tests. That's the ROI of mastering this skill in 2026.

Want help setting up AI-assisted test generation for your team? Book a free consultation to discuss your testing strategy.

// author

Tayyab Akmal

AI & QA Automation Engineer

6 years of catching critical bugs in fintech, e-commerce, and SaaS — then building the Playwright and Selenium automation that prevents them from shipping again.

→ Get in Touch → All Posts

// related_dispatches

YOU MIGHT ALSO READ

// feedback_channel

FOUND THIS USEFUL?

Share your thoughts or let's discuss automation testing strategies.

→ Start Conversation

Prompt Engineering for QA: Write Better Tests With AI

Prompt Engineering for QA: Write Better Tests With AI

Why Most AI-Generated Tests Fail (And It's Not AI's Fault)

The 5 Laws of Effective QA Prompting

Law 1: Provide Your Actual UI Structure

Law 2: Show Claude Your API Responses

Law 3: Define Success and Failure Explicitly

Law 4: Specify the Technology Stack

Law 5: Ask for Edge Cases Explicitly

Bad Prompt vs Good Prompt: Real Examples

Example 1: Login Feature

Example 2: API Response Validation

A Practical Prompt Engineering Workflow for QA

Step 1: Gather Information (5 min)

Step 2: Write Context (10 min)

Step 3: Create the Prompt

Step 4: Review Generated Tests (10 min)

Step 5: Run and Iterate (5 min)

Advanced: Chaining Prompts for Complex Test Scenarios

Common Mistakes QA Engineers Make When Using AI

Mistake 1: Not Reviewing Generated Tests

Mistake 2: Assuming Selectors Are Universal

Mistake 3: Skipping Edge Cases

Mistake 4: Not Iterating on Bad Tests

Mistake 5: Treating All Tests as Equal

Key Takeaways

Frequently Asked Questions

Next Steps

Tayyab Akmal

Test Automation Framework Setup

YOU MIGHT ALSO READ

Playwright 1.59 Agents Explained: Planner, Generator, and Healer (Should You Actually Use Them?)

Cloudflare Turnstile Is Killing My Playwright Tests: What Actually Works in 2026

Playwright Screencast API: Generate Stakeholder-Ready Test Videos (1.59 Feature Walkthrough)

FOUND THIS USEFUL?