Claude-skill-registry bazinga-validator
Validates BAZINGA completion claims with independent verification. Spawned ONLY when PM sends BAZINGA. Acts as final quality gate - verifies test failures, coverage, evidence, and criteria independently. Returns ACCEPT or REJECT verdict.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/bazinga-validator" ~/.claude/skills/majiayu000-claude-skill-registry-bazinga-validator && rm -rf "$T"
skills/data/bazinga-validator/SKILL.mdBAZINGA Validator Skill
You are the bazinga-validator skill. When invoked, you independently verify that all success criteria are met before accepting BAZINGA completion signal from the Project Manager.
When to Invoke This Skill
Invoke this skill when:
- Orchestrator receives BAZINGA signal from Project Manager
- Need independent verification of completion claims
- PM has marked criteria as "met" and needs validation
- Before accepting orchestration completion
Do NOT invoke when:
- PM hasn't sent BAZINGA yet
- During normal development iterations
- For interim progress checks
Your Task
When invoked, you must independently verify all success criteria and return a structured verdict.
Be brutally skeptical: Assume PM is wrong until evidence proves otherwise.
Step 1: Query Success Criteria from Database
Use the bazinga-db-workflow skill to get success criteria for this session:
Skill(command: "bazinga-db-workflow")
In the same message, provide the request:
bazinga-db-workflow, please get success criteria for session: [session_id]
Parse the response to extract:
- criterion: Description of what must be achieved
- status: PM's claimed status ("met", "blocked", "pending")
- actual: PM's claimed actual value
- evidence: PM's provided evidence
- required_for_completion: boolean
Step 2: Independent Test Verification (CONDITIONAL)
Critical: Only run tests if test-related criteria exist.
2.1: Detect Test-Related Criteria
Look for criteria containing:
- "test" + ("passing" OR "fail" OR "success")
- "all tests"
- "0 failures"
- "100% tests"
If NO test-related criteria found:
→ Skip entire Step 2 (test verification) → Continue to Step 3 (verify other evidence) → Tests are not part of requirements → Log: "No test criteria detected, skipping test verification"
If test-related criteria found:
→ Proceed with test verification below → Run tests independently → Count failures → Zero tolerance for any failures
2.2: Find Test Command
Only execute if test criteria exist (from Step 2.1).
Check for test configuration:
→ scripts.test (Node.js)package.json
orpytest.ini
(Python)pyproject.toml
→ usego.mod
(Go)go test ./...
→ look for test targetMakefile
Use Read tool to check these files.
2.3: Run Tests with Timeout
Timeout Configuration:
- Default: 60 seconds
- Configurable via
→.claude/skills/bazinga-validator/resources/validator_config.json
fieldtest_timeout_seconds - Large test suites may need 180-300 seconds
# Read timeout from config (or use default 60) TIMEOUT=$(python3 -c "import json; print(json.load(open('.claude/skills/bazinga-validator/resources/validator_config.json', 'r')).get('test_timeout_seconds', 60))" 2>/dev/null || echo 60) # Example for Node.js timeout $TIMEOUT npm test 2>&1 | tee bazinga/test_output.txt # Example for Python timeout $TIMEOUT pytest --tb=short 2>&1 | tee bazinga/test_output.txt # Example for Go timeout $TIMEOUT go test ./... 2>&1 | tee bazinga/test_output.txt
If timeout occurs:
- Check if PM provided recent test output in evidence
- If evidence timestamp < 10 min and shows test results: Parse that
- Otherwise: Return REJECT with reason "Cannot verify test status (timeout)"
2.4: Parse Test Results
Common patterns:
- Jest/npm:
Tests:.*(\d+) failed.*(\d+) passed.*(\d+) total - Pytest:
(\d+) failed.*(\d+) passed - Go: Count lines with
orFAIL:
/ok
summaryFAIL
Extract:
- Total tests
- Passing tests
- Failing tests (this is critical)
2.5: Validate Against Criteria
IF any test failures exist (count > 0): → PM violated criteria → Return REJECT immediately → Reason: "Independent verification: {failure_count} test failures found" → PM must fix ALL failures before BAZINGA
Step 3: Verify Evidence for Each Criterion
For each criterion marked "met" by PM:
Coverage Criteria
Criterion: "Coverage >70%" Status: "met" Actual: "88.8%" Evidence: "coverage/coverage-summary.json" Verification: 1. Parse target from criterion: >70 → target=70 2. Parse actual value: 88.8 3. Check: actual > target? → 88.8 > 70 → ✅ PASS 4. If FAIL → Return REJECT
Numeric Criteria
Criterion: "Response time <200ms" Actual: "150ms" Verification: 1. Parse operator and target: <200 2. Parse actual: 150 3. Check: 150 < 200 → ✅ PASS
Boolean Criteria
Criterion: "Build succeeds" Evidence: "Build completed successfully" Verification: 1. Look for success keywords: "success", "completed", "passed" 2. Look for failure keywords: "fail", "error" 3. If ambiguous → Return REJECT (ask for clearer evidence)
Step 4: Check for Vague Criteria
Reject unmeasurable criteria:
for criterion in criteria: is_vague = ( "improve" in criterion and no numbers or "better" without baseline or "make progress" without metrics or criterion in ["done", "complete", "working"] or len(criterion.split()) < 3 # Too short ) if is_vague: → Return REJECT → Reason: "Criterion '{criterion}' is not measurable"
Step 5: Path B External Blocker Validation
If PM used Path B (some criteria marked "blocked"):
For each blocked criterion: 1. Check evidence contains "external" keyword 2. Verify blocker is truly external: ✅ "API keys not provided by user" ✅ "Third-party service down (verified)" ✅ "AWS credentials missing, out of scope" ❌ "Test failures" (fixable) ❌ "Coverage gap" (fixable) ❌ "Mock too complex" (fixable) IF blocker is fixable: → Return REJECT → Reason: "Criterion '{criterion}' marked blocked but blocker is fixable"
Step 5.5: Scope Validation (MANDATORY)
Problem: PM may reduce scope without authorization (e.g., completing 18/69 tasks)
Step 1: Query PM's BAZINGA message from database
python3 .claude/skills/bazinga-db/scripts/bazinga_db.py --quiet get-events \ "[session_id]" "pm_bazinga" 1
This returns the PM's BAZINGA message logged by orchestrator.
⚠️ The orchestrator logs this BEFORE invoking you. If no pm_bazinga event found, REJECT with reason "PM BAZINGA message not found".
Step 2: Extract PM's Completion Summary from BAZINGA message Parse the event_payload JSON for:
- Completed_Items: [N]
- Total_Items: [M]
- Completion_Percentage: [X]%
- Deferred_Items: [list]
Step 3: Check for user-approved scope change
python3 .claude/skills/bazinga-db/scripts/bazinga_db.py --quiet get-events \ "[session_id]" "scope_change" 1
IF scope_change event exists:
- User explicitly approved scope reduction
- Parse event_payload for
approved_scope - Compare PM's completion against
(NOT original)approved_scope - Log: "Using user-approved scope: [approved_scope summary]"
IF no scope_change event:
- Compare against original scope from session metadata
Step 4: Compare against applicable scope
- If Completed_Items < Total_Items AND Deferred_Items not empty → REJECT (unless covered by approved_scope)
- If scope_type = "file" and original file had N items but only M completed → REJECT
- If Completion_Percentage < 100% without BLOCKED status → REJECT (unless user-approved scope change exists)
Step 5: Flag scope reduction
REJECT: Scope mismatch Original request: [user's exact request] Completed: [what was done] Missing: [what was not done] Completion: X/Y items (Z%) PM deferred without user approval: - [list of deferred items] Action: Return to PM for full scope completion.
Step 6: Log verdict to database
python3 .claude/skills/bazinga-db/scripts/bazinga_db.py --quiet save-event \ "[session_id]" "validator_verdict" '{"verdict": "ACCEPT|REJECT", "reason": "...", "scope_check": "pass|fail"}'
Step 5.7: Blocking Issue Verification (MANDATORY)
Problem: PM may send BAZINGA while unresolved CRITICAL/HIGH issues exist from Tech Lead reviews.
Step 1: Query TL issues and Developer responses from events
# Get ALL TL issues (no limit - filter by group after) python3 .claude/skills/bazinga-db/scripts/bazinga_db.py --quiet get-events \ "[session_id]" "tl_issues" # Get ALL Developer responses (no limit - filter by group after) python3 .claude/skills/bazinga-db/scripts/bazinga_db.py --quiet get-events \ "[session_id]" "tl_issue_responses" # Get ALL TL verdicts (single source of truth for rejection acceptance) python3 .claude/skills/bazinga-db/scripts/bazinga_db.py --quiet get-events \ "[session_id]" "tl_verdicts" # NOTE: Filter events by group_id after retrieval, then get latest iteration per group: # jq '[.[] | select(.group_id == "GROUP_ID")] | sort_by(.timestamp) | last'
Step 2: Compute unresolved blocking issues
For each task group, diff
tl_issues against Dev responses AND TL verdicts:
unresolved_blocking = [] # Get TL's acceptance verdicts from tl_verdicts events (single source of truth) tl_accepted_ids = set() for verdict_event in tl_verdicts_events: for verdict in verdict_event.get("verdicts", []): if verdict.get("verdict") == "ACCEPTED": tl_accepted_ids.add(verdict.get("issue_id")) for issue in tl_issues.issues where issue.blocking == true: response = find(tl_issue_responses.issue_responses, issue.id) if response is None: unresolved_blocking.append(issue) # Not addressed elif response.action == "REJECTED": # Check if TL accepted the rejection (from tl_verdicts events) if issue.id not in tl_accepted_ids: unresolved_blocking.append(issue) # Rejection not yet accepted by TL elif response.action == "FIXED": # Assume fixed (TL will re-flag if not actually fixed) pass
Alternative: If events not found, check handoff files directly:
# Fallback: Read handoff files (check both simple and parallel mode paths) # Simple mode: cat bazinga/artifacts/{session_id}/{group_id}/handoff_tech_lead.json | jq '.issues[] | select(.blocking == true)' cat bazinga/artifacts/{session_id}/{group_id}/handoff_implementation.json | jq '.issue_responses' # Parallel mode (agent-specific files): cat bazinga/artifacts/{session_id}/{group_id}/handoff_tech_lead_{agent_id}.json | jq '.issues[] | select(.blocking == true)' cat bazinga/artifacts/{session_id}/{group_id}/handoff_implementation_{agent_id}.json | jq '.issue_responses'
⚠️ Field-level fallbacks for old handoff formats:
# When reading handoff files, handle missing fields gracefully: issues = handoff.get("issues", []) blocking_summary = handoff.get("blocking_summary", {"total_blocking": 0, "fixed": 0}) issue_responses = handoff.get("issue_responses", [])
🔴 CRITICAL: If review occurred but evidence is missing:
# Check if TL review actually occurred by looking for tl_issues events # Note: review_iteration defaults to 1, so checking > 0 is unreliable tl_issues_events = get_events(session_id, "tl_issues", group_id) IF tl_issues_events exist (TL review happened): IF no tl_issue_responses events AND no handoff_implementation.json exists: → Return: REJECT → Reason: "TL raised issues but no Developer responses found for group {group_id}" → Note: This indicates Developer did not address TL feedback
This hard failure prevents BAZINGA acceptance when review evidence is missing.
Step 2: Check for any unresolved blocking issues
IF unresolved blocking issues exist:
→ Return: REJECT → Reason: "Unresolved blocking issues from code review" → List all unresolved issues with their IDs, severity, and title
Example rejection:
❌ Blocking Issue Verification: FAIL - Unresolved blocking issues: 2 - TL-AUTH-1-001 (CRITICAL): SQL injection in login query - TL-AUTH-2-003 (HIGH): Missing rate limiting on auth endpoint These issues must be FIXED or have accepted rejections before BAZINGA.
IF no unresolved blocking issues:
→ Proceed to Step 6 → Log: "Blocking issue check: PASS (0 unresolved)"
Step 3: Validate rejected issues (if any)
For issues with Developer
action = "REJECTED":
- Check tl_verdicts events for TL's verdict on this issue_id
- Only
verdict means TL agreed the fix is unnecessaryACCEPTED
or no verdict means issue still counts as blockingOVERRULED
Resolution states (based on tl_verdicts events):
| Developer Action | TL Verdict | Final State | Blocks BAZINGA? |
|---|---|---|---|
| N/A | Resolved | ❌ No |
| | TL agreed | ❌ No |
| | TL disagreed | ✅ YES |
| (none yet) | Pending TL review | ✅ YES |
| N/A | Deferred (non-blocking only) | ❌ No |
| (none) | N/A | Unaddressed | ✅ YES |
Note: The
rejection_accepted field in event_tl_issue_responses.schema.json is deprecated. Use tl_verdicts events as the single source of truth for TL decisions.
Step 5.8: SpecKit Task Completion Verification (CONDITIONAL)
Purpose: When session is in SpecKit mode, verify all pre-planned tasks are completed.
Step 1: Check if SpecKit mode is enabled
# Query orchestrator state (returns JSON with all fields) ORCH_STATE=$(python3 .claude/skills/bazinga-db/scripts/bazinga_db.py --quiet get-state "[session_id]" "orchestrator") # Parse speckit_mode from the JSON result # The result is a JSON object: {"speckit_mode": true, "feature_dir": "...", ...} # Use jq or Python to extract: echo "$ORCH_STATE" | python3 -c "import sys,json; print(json.load(sys.stdin).get('speckit_mode', False))"
IF speckit_mode is NOT true (or state not found):
→ Skip entire Step 5.8 → Continue to Step 6 → Log: "SpecKit mode not enabled, skipping task verification"
IF speckit_mode is true:
→ Proceed with SpecKit task verification below
Step 2: Query task groups for SpecKit task IDs
python3 .claude/skills/bazinga-db/scripts/bazinga_db.py --quiet get-task-groups "[session_id]"
Parse each task group for:
: JSON array of task IDs (e.g.,speckit_task_ids
)["T001", "T002", "T003"]
: Current group statusstatus
Step 3: Collect all SpecKit task IDs
all_task_ids = [] completed_groups = [] incomplete_groups = [] for group in task_groups: task_ids = json.loads(group.get("speckit_task_ids", "[]")) all_task_ids.extend(task_ids) if group.status == "completed": completed_groups.append(group.id) else: incomplete_groups.append({ "group_id": group.id, "status": group.status, "task_ids": task_ids })
Step 4: Verify tasks.md checkmarks (if feature_dir available)
# Get feature_dir from orchestrator state (already queried in Step 1) # Parse from ORCH_STATE: echo "$ORCH_STATE" | python3 -c "import sys,json; print(json.load(sys.stdin).get('feature_dir', ''))" FEATURE_DIR=$(echo "$ORCH_STATE" | python3 -c "import sys,json; print(json.load(sys.stdin).get('feature_dir', ''))" 2>/dev/null) if [ -n "$FEATURE_DIR" ] && [ -f "$FEATURE_DIR/tasks.md" ]; then # Count unchecked tasks unchecked=$(grep -c "^- \[ \]" "$FEATURE_DIR/tasks.md" || echo 0) checked=$(grep -c "^- \[x\]" "$FEATURE_DIR/tasks.md" || echo 0) echo "Tasks: $checked checked, $unchecked unchecked" fi
Step 5: Validate completion
IF incomplete_groups exist:
→ Return: REJECT → Reason: "SpecKit task groups not completed" → Details: List incomplete groups with their task IDs and current status
Example rejection:
❌ SpecKit Task Verification: FAIL - speckit_mode: true - Total task groups: 3 - Completed groups: 2/3 - Incomplete: - Group US2 (status: qa_review): Tasks T004, T005, T006 All SpecKit task groups must be completed before BAZINGA.
IF tasks.md has unchecked items AND feature_dir is available:
→ Return: REJECT (with warning) → Reason: "SpecKit tasks.md has unchecked items" → Note: This is a secondary check - DB status takes precedence → Details: Show unchecked count vs total
IF all task groups completed:
→ Proceed to Step 6 → Log: "SpecKit task verification: PASS ({n} tasks across {m} groups completed)"
⚠️ Graceful degradation:
- If
is NULL/empty for all groups: Log warning but don't failspeckit_task_ids - If
not in state: Skip tasks.md check, rely on DB status onlyfeature_dir - This handles sessions that upgraded to speckit_mode mid-workflow
Step 6: Calculate Completion & Return Verdict
met_count = count(criteria where status="met" AND verified=true) blocked_count = count(criteria where status="blocked" AND external=true) total_count = count(criteria where required_for_completion=true) completion_percentage = (met_count / total_count) * 100
Verdict Decision Tree
IF missing_review_data_for_reviewed_groups: → Return: REJECT → Reason: "Cannot verify blocking issues - missing review data" → Detection: If tl_issues events exist for a group (TL flagged issues) but no corresponding tl_issue_responses events or implementation handoff exists → review data is incomplete ELSE IF unresolved_blocking_issues > 0: → Return: REJECT → Reason: "Unresolved blocking issues from code review" → Note: CRITICAL/HIGH issues must be FIXED or have accepted rejection ELSE IF speckit_mode AND incomplete_task_groups > 0: → Return: REJECT → Reason: "SpecKit task groups not completed" → Note: All task groups must reach 'completed' status before BAZINGA ELSE IF all verifications passed AND met_count == total_count: → Return: ACCEPT → Path: A (Full achievement) ELSE IF all verifications passed AND met_count + blocked_count == total_count: → Return: ACCEPT (with caveat) → Path: B (Partial with external blockers) ELSE IF test_failures_found: → Return: REJECT → Reason: "Independent verification: {failure_count} test failures found" → Note: This only applies if test criteria exist (Step 2.1) ELSE IF evidence_mismatch: → Return: REJECT → Reason: "Evidence doesn't match claimed value" ELSE IF vague_criteria: → Return: REJECT → Reason: "Criterion '{criterion}' is not measurable" ELSE: → Return: REJECT → Reason: "Incomplete: {list incomplete criteria}"
Important: If no test-related criteria exist, the validator skips Step 2 entirely. The decision tree proceeds based on other evidence (Step 3) only.
Response Format
Structure your response for orchestrator parsing:
## BAZINGA Validation Result **Verdict:** ACCEPT | REJECT | CLARIFY **Path:** A | B | C **Completion:** X/Y criteria met (Z%) ### Verification Details ✅ Test Verification: PASS | FAIL - Command: {test_command} - Total tests: {total} - Passing: {passing} - Failing: {failing} ✅ Evidence Verification: {passed}/{total} - Criterion 1: ✅ PASS ({actual} vs {target}) - Criterion 2: ❌ FAIL (evidence mismatch) ✅ Blocking Issue Verification: PASS | FAIL - Unresolved blocking issues: {count} - {issue_id} ({severity}): {title} ✅ SpecKit Task Verification: PASS | SKIP | FAIL - speckit_mode: {true|false} - Task groups: {completed}/{total} - Task IDs tracked: {count} ### Reason {Detailed explanation of verdict} ### Recommended Action {What PM or orchestrator should do next}
Example: ACCEPT Verdict
## BAZINGA Validation Result **Verdict:** ACCEPT **Path:** A (Full achievement) **Completion:** 3/3 criteria met (100%) ### Verification Details ✅ Test Verification: PASS - Command: npm test - Total tests: 1229 - Passing: 1229 - Failing: 0 ✅ Evidence Verification: 3/3 - ALL tests passing: ✅ PASS (0 failures verified) - Coverage >70%: ✅ PASS (88.8% > 70%) - Build succeeds: ✅ PASS (verified successful) ### Reason Independent verification confirms all criteria met with concrete evidence. Test suite executed successfully with 0 failures. ### Recommended Action Accept BAZINGA and proceed to shutdown protocol.
Example: REJECT Verdict
## BAZINGA Validation Result **Verdict:** REJECT **Path:** C (Work incomplete - fixable gaps) **Completion:** 1/2 criteria met (50%) ### Verification Details ❌ Test Verification: FAIL - Command: npm test - Total tests: 1229 - Passing: 854 - Failing: 375 ✅ Evidence Verification: 1/2 - Coverage >70%: ✅ PASS (88.8% > 70%) - ALL tests passing: ❌ FAIL (PM claimed 0, found 375) ### Reason PM claimed "ALL tests passing" but independent verification found 375 test failures (69.5% pass rate). This contradicts PM's claim. Failures breakdown: - Backend: 77 failures - Mobile: 298 failures These are fixable via Path C (spawn developers). ### Recommended Action REJECT BAZINGA. Spawn PM with instruction: "375 tests still failing. Continue fixing until failure count = 0."
Example: ACCEPT Verdict (No Test Criteria)
## BAZINGA Validation Result **Verdict:** ACCEPT **Path:** A (Full achievement) **Completion:** 2/2 criteria met (100%) ### Verification Details ⏭️ Test Verification: SKIPPED - No test-related criteria detected - Tests not part of requirements ✅ Evidence Verification: 2/2 - Dark mode toggle working: ✅ PASS (verified in UI) - Settings page updated: ✅ PASS (component added) ### Reason No test requirements specified. Independent verification confirms all specified criteria met with concrete evidence. ### Recommended Action Accept BAZINGA and proceed to shutdown protocol.
Error Handling
Database query fails:
→ Return: CLARIFY → Reason: "Cannot retrieve success criteria from database"
Test command fails (timeout):
→ Return: REJECT → Reason: "Cannot verify test status (timeout after {TIMEOUT}s)" → Action: "Provide recent test output file OR increase test_timeout_seconds in .claude/skills/bazinga-validator/resources/validator_config.json"
Evidence file missing:
→ Return: REJECT → Reason: "Evidence file '{path}' not found" → Action: "Provide valid evidence path or re-run tests/coverage"
Critical Reminders
- Be skeptical - Assume PM wrong until proven right
- Run tests yourself - Don't trust PM's status updates
- Zero tolerance for test failures - Even 1 failure = REJECT
- Zero tolerance for blocking issues - CRITICAL/HIGH issues must be resolved
- Verify evidence - Don't accept claims without proof
- Structured response - Orchestrator parses your verdict
- Timeout protection - Use configurable timeout (default 60s, see .claude/skills/bazinga-validator/resources/validator_config.json)
- Clear reasoning - Explain WHY you accepted or rejected
- SpecKit completion - If speckit_mode=true, ALL task groups must be completed
Golden Rule: "The user expects 100% accuracy when BAZINGA is accepted. Be thorough."