Claude-corps milestone-review
Iterative review-fix loop for accumulated milestone/branch changes. Runs parallel reviewers, fixes findings autonomously, repeats until clean. Use after multiple tasks merge to a milestone branch, or before merging to main. Invoke with /milestone-review --base-branch main. Supports --dry-run and --max-iterations.
git clone https://github.com/josephneumann/claude-corps
T=$(mktemp -d) && git clone --depth=1 https://github.com/josephneumann/claude-corps "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/milestone-review" ~/.claude/skills/josephneumann-claude-corps-milestone-review && rm -rf "$T"
skills/milestone-review/SKILL.mdMilestone Review: Iterative Review-Fix Loop
You are running an autonomous review-fix loop on accumulated branch changes. Unlike
/multi-review (which is interactive), you fix ALL verified findings yourself — iterating until the branch is clean.
Section 1: Parse Arguments
Arguments:
$ARGUMENTS
Parse the following flags:
— Maximum review-fix cycles (default: 5)--max-iterations N
— Compare against this branch (default: main)--base-branch <branch>
— Minimum severity to fix:--severity <level>
,critical
,important
(default: important — fixes Critical + Important)suggestions
— Report findings without fixing anything--dry-run
Section 2: Validate Environment
- Confirm there are changes vs base:
git diff <base-branch>...HEAD --stat - If no changes, exit early: "Nothing to review — branch is identical to
."<base-branch> - Log scope:
MILESTONE REVIEW STARTING Branch: <current branch> Base: <base-branch> Files changed: <count> Lines changed: +<added> -<removed> Max iterations: <N> Severity threshold: <level>
Section 3: Review-Fix Loop
Iterate from 1 to
max_iterations. Track findings across iterations to detect plateaus.
Step 3.1: Compute Diff
git diff <base-branch>...HEAD --name-only
Read the full diff for reviewer context:
git diff <base-branch>...HEAD
Step 3.2: Select Reviewers
Replicate multi-review's reviewer selection logic (Steps 1.5–3 of
/multi-review):
-
Load
if present.claude/review.json -
Auto-detect frameworks from file patterns:
File Patterns Agent
,*.tsx
,*.jsx
,next.config.*middleware.tsnextjs-reviewer
,*.css
,tailwind.*components/ui/**tailwind-reviewer
,*.pyalembic/**python-backend-reviewer
,routes/**
,api/**
,endpoints/**controllers/**api-security-reviewer
,*.tsx
,*.jsx
,*.vue
,*.svelte
,components/**pages/**ux-reviewer
,*.tsx
,*.jsx
,*.css
,next.config.*package.jsonfrontend-performance-reviewer -
Always include (milestone review is about cross-cutting concerns):
code-simplicity-reviewerpattern-recognition-specialistarchitecture-strategist
-
Conditionally include based on change types and risk tiers:
— auth, input handling, secrets, user datasecurity-sentinel
— database queries, loops, cachingperformance-oracle
— agent definitions, skill files, system promptsagent-native-reviewer
— database migrations, schema changesdata-integrity-guardian
— data backfills, ID mappingsdata-migration-expert
-
Apply
/reviewers.exclude
overrides from review config if presentreviewers.include -
Codex integration (config-only): If review.json contains
:"codex": { "enabled": true }- Set
$CODEX_ENABLED = true - Set
if$CODEX_ADVERSARIAL = true"codex": { "adversarial": true } - Verify codex CLI:
which codex && codex --version - If missing, log: "Codex CLI not found — skipping codex review. Install:
" and setnpm install -g @openai/codex$CODEX_ENABLED = false
- Set
Select 3–7 reviewers total (codex is additional, does not count toward this limit).
Step 3.3: Read Agent Definitions
For each selected reviewer:
cat ~/.claude/agents/review/<agent-name>.md
Step 3.4: Launch Parallel Reviews
Use the Task tool to spawn parallel review agents. Each reviewer gets:
- The full diff context
- Contents of key changed files
- Instruction to return findings in standard format:
## [Agent Name] Findings ### Critical Issues - [file:line] Issue description - Confidence: X% ### Important Issues - [file:line] Issue description - Confidence: X% ### Suggestions - [file:line] Issue description - Confidence: X%
Model selection: Follow multi-review's tier-based logic. Default to Sonnet. Use Opus for
security-sentinel and architecture-strategist when risk tier is critical.
Codex reviewer (if
):$CODEX_ENABLED
Launch an additional Agent in parallel with the Claude reviewers above:
N+1. Agent: codex-reviewer - Subagent type: general-purpose - Model: sonnet - Prompt: You are a review output normalizer. Your job: 1. Run codex to perform a code review via Bash (timeout: 300000ms): codex review --base <base-branch> "<review prompt>" 2. Capture the stdout output. 3. Parse each finding and normalize into this format: ## Codex Reviewer Findings ### Critical Issues - [file:line] [CODEX] Issue description - Confidence: X% ### Important Issues - [file:line] [CODEX] Issue description - Confidence: X% ### Suggestions - [file:line] [CODEX] Issue description - Confidence: X% 4. Map codex severity: Critical/Blocker/Bug/Security → Critical, Warning/Should-fix/Improvement → Important, Suggestion/Nit/Style → Suggestions 5. If no confidence given: Critical 90%, Important 80%, Suggestions 70% 6. If no file:line references, use file path only. 7. Return ONLY the normalized findings.
If
$CODEX_ADVERSARIAL, use review prompt: "Perform an adversarial review. Challenge design decisions, surface hidden assumptions, question tradeoffs, and pressure-test the approach."
Otherwise: "Review for security vulnerabilities, correctness bugs, performance issues, and code quality. Format findings by severity."
Step 3.5b: Browser Workflow Testing (Parallel with Code Reviews)
When changed files include frontend patterns (
.tsx, .jsx, .vue, .svelte, .html, .css, .scss):
-
If
, skip browser testing.--dry-run -
Read
and follow Phases 1-6 (first iteration only):docs/browser-testing-protocol.md- Pre-flight checks — verify Playwright MCP available, dev server running (Phase 1)
- Infer workflows from diff — classify changed files, propose to user via
for confirmation (Phase 2)AskUserQuestion - Navigate → clear cache/storage → reload — ensures fresh state, not stale cache (Phase 3)
- Handle auth if page redirects to login (Phase 4)
- Execute workflow-type checklists — interact, verify outcomes, verify persistence via reload (Phase 5)
- Responsive check at desktop (1280x800) + mobile (375x812) and report findings (Phase 6)
-
Include browser findings in the aggregated results alongside code review findings. Classify as Critical/Important/Minor per the protocol's Phase 6 severity table.
Milestone-review specific: Cache the user's URL response and workflow list — don't re-ask on subsequent iterations. On iterations 2+, re-run only the workflows that touched files modified by fixes.
Step 3.5: Aggregate and Filter Findings
Process reviewer results using escalating thresholds — be aggressive early, tighten each iteration:
| Iteration | Severities | Min Confidence | Scope |
|---|---|---|---|
| 1 | Critical + Important | >= 80% | All findings |
| 2 | Critical + Important | >= 85% | Net-new findings only (not flagged in iteration 1) |
| 3+ | Critical only | >= 90% | Net-new findings only |
Constant rules (all iterations):
- Security findings (
,security-sentinel
) always included regardless of confidence or iterationapi-security-reviewer - Include Suggestions only if
was passed (and only in iteration 1)--severity suggestions - Deduplicate findings that multiple reviewers flagged (keep the highest-confidence version)
Net-new detection: A finding is "net-new" if no finding from the previous iteration references the same file:line range (within 5 lines) for the same category of issue. If a reviewer flags the same area for the same reason, it's a re-flag — skip it.
Step 3.6: Autonomous Fix Evaluation
This is the key differentiator from
. Multi-review splits findings into "auto-fixable" and "manual review recommended" and asks the user. Milestone-review rejects this distinction. You treat every verified finding as your responsibility to fix, regardless of complexity./multi-review
If
--dry-run: report findings and exit. Do not fix anything.
For EACH Critical/Important finding:
- Verify: Read actual code at file:line — confirm the finding is real. Reviewers hallucinate.
- Evaluate: Is this a real issue? Drop ONLY if:
- The finding is a false positive (code doesn't actually have the problem)
- The finding is purely speculative/theoretical with no practical impact
- Fixing it would require a human architectural decision (e.g., "should we use library X or Y?")
- Fix: Implement the complete fix. Not a superficial patch — the real fix that fully resolves the issue. If the finding says "these 3 files have inconsistent error handling patterns," fix all 3 files.
- Track each finding into one of:
— issue was real, fix implementedfixed[]
— issue was false positive or speculative (log reason)dropped[]
— issue is real but requires human decision (log what decision is needed)deferred[]
Important: The bar for deferring is HIGH. "This is complex" is NOT a valid reason to defer. "This requires choosing between two valid architectural approaches" IS.
Step 3.7: Run Tests After Fixes
Auto-detect and run the project's test command:
→pyproject.tomluv run pytest
→package.json
(orpnpm test
if no pnpm-lock)npm test
withMakefile
→run-checksmake run-checks
withMakefile
→testmake test
If tests fail:
- Diagnose the failure
- If your fix caused it, adjust the fix (don't just revert and give up)
- Only move to
if the fix genuinely can't be made to work without human guidancedeferred
Tests must pass before committing.
Step 3.8: Commit Fixes
git add <specific files changed> git commit -m "fix: address milestone review findings (iteration N) - <list of issues fixed> Co-Authored-By: Claude <noreply@anthropic.com>"
Step 3.9: Check Exit Conditions
Evaluate in this order (first match wins):
- Clean: Zero findings passed the threshold filter this iteration → EXIT
- Clean enough: All findings were dropped as false positives → EXIT
- Diminishing returns: Fewer than 3 net-new findings this iteration AND zero Critical → EXIT (you've hit the long tail — remaining nits aren't worth another full review cycle)
- Plateau: Net-new findings count >= previous iteration's count (not converging) → EXIT (fixes are creating as many issues as they resolve — stop the churn)
- Limit: Max iterations reached → EXIT (report deferred items)
- Otherwise: Next iteration
On exit, log the reason so the report explains why the loop stopped (e.g., "Exited after iteration 2: diminishing returns (1 net-new Important, 0 Critical)").
Step 3.10: Large Diff Handling
If the diff exceeds ~40 changed files, batch the review:
- Group related files by directory or feature area
- Run reviewers on each batch sequentially to stay within context limits
- Fixes still accumulate into a single commit per iteration
Section 4: Push and Report
git push
Generate the final report:
═══════════════════════════════════════════ MILESTONE REVIEW COMPLETE ═══════════════════════════════════════════ Branch: <branch> Base: <base-branch> Iterations: N Exit reason: <clean|clean enough|diminishing returns|plateau|limit> Findings evaluated: X Fixed: Y Dropped (false positive): Z Deferred (needs human): W PER-ITERATION BREAKDOWN: Iteration 1: <N> findings (threshold: Important+ >= 80%) → <M> fixed Iteration 2: <N> findings (threshold: Important+ >= 85%, net-new only) → <M> fixed ... FIXED: - [file:line] issue (iteration N) - ... DEFERRED (if any): - [file:line] issue — what human decision is needed - ... DROPPED: - [file:line] issue — why it's a false positive - ... ═══════════════════════════════════════════
Section 5: Session Summary
Write a standard session summary to
docs/session_summaries/milestone-review_<timestamp>.txt for orchestrator reconciliation. Include:
- Branch name and base branch
- Iteration count
- Full list of fixed, dropped, and deferred findings
- Test results
- Any deferred items that need human attention