Dev-skills qa-test
Browser-based QA verification after any implementation. Use when someone says "QA this", "test this in browser", "verify the feature", "qa test", "browser test", or after completing an /implement-change to verify acceptance criteria in a real browser. Opens Chrome via MCP, exercises each acceptance criterion, verifies via DOM snapshots, and reports pass/fail. The "closer" for every implementation — proof it works, not just that tests pass.
git clone https://github.com/teambrilliant/dev-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/teambrilliant/dev-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/qa-test" ~/.claude/skills/teambrilliant-dev-skills-qa-test && rm -rf "$T"
skills/qa-test/SKILL.mdQA Test
Verify implemented features in a real browser. Exercise each acceptance criterion, verify via snapshots, report results.
Context-efficient design: browser testing runs in a sub-agent so snapshot/interaction data stays out of the main thread. Main thread only sees compact pass/fail summaries.
Sub-agent rule: Always use sub-agents for browser interaction and pre-flight exploration — these are broad, multi-step tasks that generate significant context. Fan out pre-flight and browser testing as separate sub-agents when they're independent. Handle directly only when verifying a single, specific element or reading one file for criteria.
Process
- Pre-flight (sub-agent) — gather criteria, resolve URL, check environment
- Interactive setup — human steers browser for hard-to-automate steps (login, drag, etc.)
- Browser testing (sub-agent) — exercises all criteria in isolated context
- Report results — main thread receives compact summary only
- Handle failures — retry failed criteria after manual intervention if needed
1. Pre-flight Sub-agent
Launch an
Explore sub-agent before any browser interaction to gather all context in parallel.
Sub-agent prompt:
Gather QA pre-flight context for testing. Return a structured JSON block with: 1. **acceptance_criteria**: List of testable criteria. Check these sources in order, stop at the first that has criteria: - **The shape doc referenced by the implementation plan** (canonical source): look for `## Acceptance Criteria` section in `thoughts/plans/*.md` — it should link to a `thoughts/research/*.md` shape doc. Read that shape doc's `### Acceptance Criteria` section and use those criteria verbatim. - The user's prompt (if criteria were given explicitly) - Current plan file (if it lists criteria directly — unshaped work path) - Current PR description: run `gh pr view --json body` via Bash - Current branch diff: run `git diff main...HEAD --stat` then read changed files to infer what user-visible behavior changed (LAST RESORT — unshaped work only) - Linked issue: check PR body for issue references, fetch with `gh issue view` Return `criteria_source` alongside the criteria so the main thread knows which tier was used (shape doc / prompt / plan / PR / diff / issue). 2. **test_url**: Where to test. Check in order: - `.tap/tap-audit.md` Environments section - `package.json` scripts for `dev`, `start`, or similar - Common defaults: localhost:3000, :5173, :4321, :6886 3. **app_running**: Try to fetch the test_url via `curl -s -o /dev/null -w '%{http_code}'`. Return the status code. If not running, return the dev command that would start it. 4. **test_pages**: List of specific page URLs/routes to visit based on the changed files (e.g., if `modules/campaigns/` changed, the test page is likely `/campaigns/...`) 5. **db_available**: Check if postgres MCP tools are available (search for `mcp__postgres__execute_sql` or similar). Return true/false. 6. **has_async_flows**: Based on the changed files, flag whether the feature involves background jobs (Temporal workflows, queues, webhooks) that need async verification. 7. **needs_login**: Whether the app requires authentication. Check for login pages, auth middleware, or session requirements in the codebase. Return results as a structured summary, not raw tool output.
Using the pre-flight results:
- If
is not 200, start the dev server (background) and wait for itapp_running - Use
as the test planacceptance_criteria - Use
to know where to navigate firsttest_pages - If
, include database verification stepsdb_available - If
, use the async testing patternhas_async_flows - If
, prompt user for interactive setup before launching browser sub-agentneeds_login
Fallback (no sub-agent): If sub-agents are unavailable, gather criteria and resolve URL sequentially.
<details> <summary>Manual fallback: Gather criteria and resolve URL</summary>Gather acceptance criteria from (in priority order):
- Shape doc referenced by the implementation plan (
viathoughts/research/*.md
) — canonical source when it existsthoughts/plans/*.md - Explicit criteria provided in the prompt
- Current plan file if it lists criteria directly (unshaped path)
- Current ticket/issue (if referenced)
- PR description
- Diff inference — last resort, unshaped work only
If no criteria found, ask in human mode. In agent mode, infer from the diff.
Resolve test URL (in priority order):
- URL provided in the prompt
→ Environments section.tap/tap-audit.md
scripts →package.json
,dev
, or similarstart- Common defaults:
,http://localhost:3000
,http://localhost:5173http://localhost:4321
Verify the app is running before proceeding.
</details>2. Interactive Setup (Main Thread)
Before launching the browser testing sub-agent, handle anything that's hard to automate in the main thread. The browser state persists since the sub-agent connects to the same Chrome instance.
When to prompt for interactive setup:
is true → ask user: "App requires login. Want me to navigate to login page so you can sign in, or should I attempt automated login?"needs_login- Complex drag-and-drop or gesture-based preconditions
- Multi-factor auth, CAPTCHAs, OAuth popups
What to do:
- Navigate to the relevant page via Chrome MCP
- Tell the user what action is needed
- Wait for user confirmation that setup is complete
- Then launch the browser testing sub-agent
If no interactive setup is needed, skip directly to step 3.
3. Browser Testing (Sub-agent)
Launch a
sub-agent for all browser interaction. This keeps snapshot/interaction data out of the main thread context.general-purpose
Sub-agent prompt template:
You are running browser-based QA tests. The browser is already open and may already be logged in / set up. Test URL: {test_url} Acceptance criteria to verify: {numbered list of criteria} Additional context: - DB tools available: {db_available} - Has async flows: {has_async_flows} - Test pages: {test_pages} ## Evaluator mindset You are an independent evaluator. Your job is to be skeptical, not helpful. The generator already believes its work is correct — your value is catching what it missed. Rules: - **Binary outcomes**: a criterion fully passes or it fails. If you catch yourself writing "PASS with caveats" or "close enough", mark it **FAIL** and state the gap. - **Evidence required**: for each PASS, cite what you observed — element text, URL after action, console/network status, DB row. PASS with no cited evidence → FAIL. - **Do not rationalize**: if something looked off but "probably works", mark it FAIL or PARTIAL and describe what looked off. Let the human/agent decide if it's acceptable — that's not your call. - **Specific bugs, not vague assessments**: when a criterion fails, file a concrete actionable finding ("Delete button calls `/api/items/:id` which returns 500; expected 204") — not "delete doesn't work." ## How to test Use Chrome MCP tools (`mcp__chrome-devtools__*`). **Snapshot-first workflow** — use `take_snapshot` for BOTH finding elements AND verifying results. Do NOT use `take_screenshot` unless a criterion fails and you need visual debugging evidence. **For each criterion:** 1. Navigate to the relevant page 2. `take_snapshot` → get element UIDs and current state 3. Interact via UIDs (`click`, `fill`, `hover`) 4. `take_snapshot` → verify state changed as expected 5. Check `list_console_messages` for errors 6. Check `list_network_requests` for failed requests (4xx, 5xx) **Important**: UIDs are ephemeral — always take a fresh snapshot before interacting. **On failure only**: `take_screenshot` and save to `./qa-evidence/` for debugging. **React/SPA hover interactions:** Chrome DevTools `hover` only triggers CSS `:hover`, NOT JS `mouseenter`/`mouseover`. If a UI element only appears via React's `onMouseEnter`: 1. Try `click` directly in the area 2. If that fails, `evaluate_script` to dispatch mouseenter event 3. `take_snapshot` to confirm **Testing patterns:** - Form submission: fill → submit → snapshot to verify success + check no errors - Navigation: click → snapshot to verify new state + check URL - State changes: trigger action → snapshot to verify → reload → snapshot to verify persistence - Async: trigger → snapshot for intermediate state → poll snapshots → verify final state - Error states: trigger invalid input → snapshot to verify error messaging **Always check:** - Console errors (JS exceptions) - Failed network requests (4xx, 5xx) ## Report format Return ONLY a compact summary in this exact format: RESULT: [PASS / FAIL / PARTIAL] CRITERIA: 1. [criterion] — PASS/FAIL — [one-line observation] 2. [criterion] — PASS/FAIL — [one-line observation] ... ERRORS: [any console errors or failed network requests, or "none"] FAILURES: [for any failed criterion: what happened, what was expected, screenshot path if captured] NEEDS_MANUAL: [any criteria that couldn't be tested due to automation limitations — e.g., drag-and-drop, complex gestures]
4. Report Results
The sub-agent returns a compact summary. Present it to the user.
Human mode: Show the summary. If any failures, ask: "Want me to fix this and re-test, or is this expected?"
Agent mode: If all pass, proceed (e.g., open PR). If any fail, attempt fix-and-retest.
For UI-visible changes: after functional QA passes, offer design-fidelity QA as the second half of the closer — "Want to run
Review against the changed components? Functional pass doesn't mean it matches the product's design language." This only applies when the diff touches user-visible UI (components, layouts, copy); skip for pure backend or infra changes./dev-skills:design-language
5. Failure Handling
Automation failures (NEEDS_MANUAL):
- The user performs the manual action in the browser (main thread)
- Launch a new sub-agent to verify only the remaining criteria
- The new sub-agent picks up the browser state left by the user
Code failures (FAIL):
Agent mode:
- Fix the code
- Launch new sub-agent to re-test only failed criteria
- Max 2 fix-and-retest cycles
After 2 failed cycles, escalate — do NOT keep iterating. Branch by failure shape:
- Same criterion failing the same way both cycles → the plan is likely wrong,
not the code. Re-enter
with the failing criterion and the observed behavior; do not attempt a third code fix./dev-skills:implementation-planning - Different criteria failing each cycle / shifting failures → hand to human
with a consolidated diff: for each failing criterion,
vsexpected: …
plus the specific bug finding from the evaluator. Stop.observed: …
Human mode:
- Present failures
- Ask: "Want me to fix this and re-test, or is this expected behavior?"
Optional: Database Verification
Include in the sub-agent prompt when
db_available is true. For features that create or modify data:
- Record creation: Verify expected rows exist with correct values
- Relational data: Confirm junction table rows were created
- Status transitions: Confirm async workflows completed
Boundaries
- Does NOT write unit tests (that's implement-acceptance-tests)
- Does NOT review code quality (that's CLAUDE.md / code review)
- Does NOT assess blast radius (that's /blast-radius)
- Tests user-visible behavior in the browser, with optional database verification
- Does NOT modify acceptance criteria — tests what was specified