Agentops validation
Full validation phase orchestrator. Vibe + post-mortem + retro + forge. Reviews implementation quality, extracts learnings, feeds the knowledge flywheel. Triggers: "validation", "validate", "validate work", "review and learn", "validation phase", "post-implementation review".
git clone https://github.com/boshu2/agentops
T=$(mktemp -d) && git clone --depth=1 https://github.com/boshu2/agentops "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills-codex/validation" ~/.claude/skills/boshu2-agentops-validation && rm -rf "$T"
skills-codex/validation/SKILL.md$validation — Full Validation Phase Orchestrator
YOU MUST EXECUTE THIS WORKFLOW. Do not just describe it.
Strict Delegation Contract (default)
Validation delegates to
$vibe, $post-mortem, $retro, and $forge (plus lifecycle skills $test, $deps, $review, $perf) as separate skill invocations. Strict delegation is the default.
Anti-pattern to reject: spawning judges directly in place of
$vibe, inlining post-mortem analysis, skipping $forge. See ../shared/references/strict-delegation-contract.md for the full contract and supported compression escapes (--quick, --no-retro, --no-forge, --no-lifecycle, --no-behavioral, --allow-critical-deps).
See
for the live compression signature..agents/learnings/2026-04-19-orchestrator-compression-anti-pattern.md
DAG — Execute This Sequentially
mkdir -p .agents/rpi detect complexity from execution-packet or --complexity flag (default: standard) detect ao CLI availability
Step 0: Load Prior Validation Context
Before running the validation pipeline, pull relevant learnings from prior reviews:
if command -v ao &>/dev/null; then ao lookup --query "<epic or goal context> validation review patterns" --limit 5 2>/dev/null || true fi
Apply retrieved knowledge (mandatory when results returned):
If learnings are returned, do NOT just load them as passive context. For each returned item:
- Check: does this learning apply to the current validation scope? (answer yes/no)
- If yes: include it as a
— what pattern does it warn about? does the code exhibit it?known_risk - Cite applicable learnings by filename when they influence a validation finding
After applying, record each citation:
ao metrics cite "<learning-path>" --type applied 2>/dev/null || true
Skip silently if ao is unavailable or returns no results.
Run every step in order. Do not stop between steps.
STEP 1 ── Skill(skill="vibe", args="recent [--quick]") Use --quick for fast/standard. Full council for full. PASS/WARN? → continue FAIL? → write summary, output <promise>FAIL</promise>, stop (validation cannot fix code — caller decides retry) STEP 1.5 ── Four-Surface Closure (mandatory) Read `skills/validation/references/four-surface-closure.md` for the mandatory four-surface closure check. Check all four surfaces: Code, Documentation, Examples, Proof. All 4 pass? → continue if --strict-surfaces: Any surface fails? → FAIL, write summary, output <promise>FAIL</promise>, stop else (default): Code passes, others fail? → WARN, continue Code fails? → BLOCK, write summary, output <promise>FAIL</promise>, stop STEP 1.6 ── Test pyramid coverage audit (advisory, append to summary) Check L0-L3 + BF1/BF4 per modified file. WARN only, not FAIL. STEP 1.7 ── Lifecycle Checks (advisory except critical dependency findings) Skip entire step if: --no-lifecycle flag. Each sub-step uses --quick mode to limit context consumption. On budget expiry: skip remaining sub-steps, write [TIME-BOXED]. a) if lifecycle tier >= minimal AND test_framework_detected: Skill(skill="test", args="coverage --quick") Append coverage delta to phase summary. b) if lifecycle tier >= standard AND dependency_manifest_exists: Skill(skill="deps", args="vuln --quick") CRITICAL vulns (CVSS >= 9.0): **FAIL** (block shipping). Opt-out: `--allow-critical-deps` for acknowledged risk acceptance. Non-critical: advisory note only. c) if lifecycle tier >= standard: Skill(skill="review", args="--diff --quick") Append review findings to summary as advisory. d) if lifecycle tier == full AND modified_files_touch_hot_path: Skill(skill="perf", args="profile --quick") Append perf findings to summary as advisory. Hot path detection: modified files match benchmark files or patterns (handler, middleware, router, parser, engine, worker, pool, codec). STEP 1.8 ── Stage 4: Behavioral Validation (holdout scenarios + agent-built specs) Skip if: no .agents/holdout/ directory AND no .agents/specs/ directory Skip if: --no-behavioral flag set Sub-steps: a) List active scenarios and agent-built specs: ao scenario list --status active 2>/dev/null find .agents/specs -name "*.json" -type f 2>/dev/null a.5) For each agent-built spec in .agents/specs/, treat as a scenario with source="agent". Validate against scenario schema (auto-* id pattern). Add to evaluation set alongside holdout scenarios. b) If 0 scenarios AND 0 specs → skip with note "No behavioral validation artifacts found" c) Spawn evaluator council with AGENTOPS_HOLDOUT_EVALUATOR=1 Pass scenarios + implementation diff as judge context d) Each judge evaluates: "Does the implementation satisfy the scenario's expected_outcome? Score each acceptance_vector dimension 0.0-1.0." e) Compute satisfaction_score per scenario (mean of dimension scores) f) Aggregate: mean satisfaction across all scenarios g) Gate: mean >= scenario.satisfaction_threshold → PASS mean >= 0.5 → WARN ("Partial satisfaction — review scenarios") mean < 0.5 → FAIL ("Implementation does not satisfy holdout scenarios") h) Write results to .agents/rpi/scenario-results.json i) Include satisfaction_score in validation_state PASS/WARN? → continue to STEP 2 FAIL? → write summary, output <promise>FAIL</promise>, stop STEP 2 ── if epic_id: Skill(skill="post-mortem", args="<epic-id> [--quick]") else: Skill(skill="post-mortem", args="recent [--quick]") Use --quick for fast/standard. Full council for full. PASS/WARN? → continue FAIL? → write summary, output <promise>FAIL</promise>, stop STEP 3 ── if not --no-retro: Skill(skill="retro") STEP 4 ── if not --no-forge AND ao available: if [ -n "${CODEX_THREAD_ID:-}" ] || [ "${CODEX_INTERNAL_ORIGINATOR_OVERRIDE:-}" = "Codex Desktop" ]; then ao codex ensure-stop --auto-extract 2>/dev/null || true else ao forge transcript --last-session --queue --quiet 2>/dev/null || true fi STEP 5 ── write phase summary to .agents/rpi/phase-3-summary-YYYY-MM-DD-<slug>.md ao ratchet record vibe 2>/dev/null || true output <promise>DONE</promise>
That's it. Steps 1→2→3→4→5. No stopping between steps.
Setup Detail
Track state inline:
epic_id, complexity, no_retro, no_forge, strict_surfaces, vibe_verdict, post_mortem_verdict. Load execution packet (if available): read complexity, contract_surfaces, and done_criteria from .agents/rpi/execution-packet.json. When a current run_id is known, prefer the matching .agents/rpi/runs/<run-id>/execution-packet.json archive over the latest alias.
Gate Detail
Validation has multiple blocking conditions. Validation cannot fix code — it can only report and fail closeout when the lifecycle contract is not met.
- Blocking FAIL conditions:
FAIL, code-surface failure in STEP 1.5,$vibe
failure on any closure surface, CVSS >= 9.0 dependency findings in STEP 1.7b unless--strict-surfaces
, and post-mortem FAIL in STEP 2.--allow-critical-deps - PASS/WARN: Log verdicts, continue through the remaining steps.
- FAIL: Extract findings from the latest evaluator output, write phase summary with FAIL status, output
with findings attached. Suggest:<promise>FAIL</promise>
."Validation FAIL. Fix findings, then re-run $validation [epic-id]"
Why no internal retry: Retries require re-implementation (
$crank). The caller ($rpi or human) decides whether to loop back.
Phase Summary Format
Write to
.agents/rpi/phase-3-summary-YYYY-MM-DD-<slug>.md:
# Phase 3 Summary: Validation - **Epic:** <epic-id or "standalone"> - **Vibe verdict:** <PASS|WARN|FAIL> - **Post-mortem verdict:** <verdict or "skipped"> - **Retro:** <captured|skipped> - **Forge:** <mined|skipped> - **Complexity:** <fast|standard|full> - **Status:** <DONE|FAIL> - **Timestamp:** <ISO-8601>
Phase Budgets
| Sub-step | | | |
|---|---|---|---|
| Vibe | 2 min | 3 min | 5 min |
| Post-mortem | 2 min | 3 min | 5 min |
| Retro | 1 min | 1 min | 2 min |
| Forge | skip | 2 min | 3 min |
On budget expiry: allow in-flight calls to complete, write
[TIME-BOXED] marker, proceed.
Flags
| Flag | Default | Description |
|---|---|---|
| auto | Force complexity level (fast/standard/full) |
| off | Skip ALL lifecycle checks in STEP 1.7 (test, deps, review, perf) |
| matches complexity | Controls which lifecycle skills fire: (test only), (+deps, +review), (+perf) |
| off | Skip retro step only |
| off | Skip forge step only |
| off | Disable phase time budgets |
| off | Make all 4 surface failures blocking (FAIL instead of WARN). Passed automatically by . |
| off | Allow shipping with CVSS >= 9.0 vulnerabilities (acknowledged risk acceptance) |
Quick Start
$validation ag-5k2 # validate epic with full close-out $validation # validate recent work (no epic) $validation --complexity=full ag-5k2 # force full council ceremony $validation --no-retro ag-5k2 # skip retro only $validation --no-forge ag-5k2 # skip forge only
Completion Markers
<promise>DONE</promise> # Validation passed, learnings captured <promise>FAIL</promise> # Vibe failed, re-implementation needed (findings attached)
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
| Vibe FAIL on first run | Implementation has quality issues | Fix findings via , then re-run |
| Post-mortem reviewed recent work instead of an epic | No epic-id provided | Pass epic-id for epic-scoped closeout: |
| Codex closeout missing | Codex has no session-end hook surface | Let run , or run manually before leaving the session |
| Forge produces no output | No ao CLI or no transcript content | Install ao CLI or run manually |
| Stale execution-packet | Packet from a previous RPI cycle | Delete and pass explicitly |
Reference Documents
- references/four-surface-closure.md — four-surface closure validation (code + docs + examples + proof)
- references/forge-scope.md — forge session scoping and deduplication
- references/idempotency-and-resume.md — re-run behavior and standalone mode
See Also
Core phases: vibe, post-mortem, retro, forge, crank, discovery, rpi. Lifecycle Step 1.7: test, deps, review, perf.