Learn-skills.dev codex-review-cycle
Run a bounded 3-cycle interactive review-and-fix workflow against a user-chosen git review target (working-tree diff, current branch vs. auto-detected base, or an explicit commit/tag/branch ref) using the codex plugin. Each cycle invokes codex `review` or `adversarial-review --json`; Claude verifies each finding against a six-item validity checklist, calls `review-scope-guard` for Definition-of-Done triage, and the user picks which findings to fix before the next cycle. Covers both code diffs and markdown planning documents. Hard cap at 3 cycles. Use ONLY when the user explicitly asks to run the codex review cycle on working-tree changes, a committed branch diff, or an explicit base ref. Do NOT trigger for single-shot review requests, auto-hardening, background reviews, plan drafting, or when the chosen target would produce an empty diff.
git clone https://github.com/NeverSight/learn-skills.dev
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/adhi-jp/agent-skills/codex-review-cycle" ~/.claude/skills/neversight-learn-skills-dev-codex-review-cycle && rm -rf "$T"
data/skills-md/adhi-jp/agent-skills/codex-review-cycle/SKILL.mdCodex Review Cycle
Overview
A simple, user-driven review-and-fix workflow. Every cycle runs one codex review, Claude verifies each finding's validity, presents a verbatim summary, and the user picks which findings to address. Claude then applies only the chosen fixes and loops. Three cycles is a hard cap — the loop never runs a fourth cycle without the user starting a new invocation.
The skill is deliberately simple. It does not auto-fix, does not run parallel reviewers, does not compute stall fingerprints, does not manage autonomy bands, and does not delegate to rescue subagents. Claude is the only fix applier; the user is the only arbiter of which findings matter.
Language
All user-facing output is rendered in the user's language (the language the user has been using in the conversation, or as configured in the Claude Code system-level language setting). This section is the authoritative translation contract — any per-language sample reference (e.g.
references/summary-samples.ja.md) is illustrative only and MUST NOT contradict these rules.
Translate into the user's language:
- Section headings and column labels (
,Claude's note
,Recommended action
,Scope
, etc. column header text)Severity - Free-text fields Claude authors:
body,Claude's note
values, fleet-rate warnings, stop-signal footer prose, termination messages, post-cycle review assessmentRecommended action AskUserQuestion
,question
, and optionheader
/label
fieldsdescription
Keep verbatim (do NOT translate), regardless of user language:
- Codex
field (surfaced in thetitle
column)Title (codex verbatim) - Codex
field (quoted below the table per §Summary Output Template)recommendation - Severity values (
/high
/medium
) — codex outputlow - Validity outcome keywords (
/valid
/partially-valid
)invalid - Scope category names (
/must-fix
/minimal-hygiene
/reject-out-of-scope
)reject-noise - Stop-signal
keywords (Status
/ACTIVE
/ADVISORY
/WARNING
)silent - Technical identifiers: file paths, git refs (SHAs, branch names),
,fingerprint
, field names likecluster_id
,applied_fixesnot_evaluated_signal_names - Cycle indices (
,cycle N
)N/3
For a Japanese rendering example that applies these rules, see
references/summary-samples.ja.md. For German, Korean, or other languages, apply the same rules directly — the Japanese sample is an illustration, not a template to translate.
When to Use
Use this skill ONLY when:
- The user explicitly asks to run the codex review cycle on one of: the current working-tree diff, the current branch vs. its base branch, or an explicit commit/tag/branch ref, and
- The current working directory is a git repository, and
- The chosen review target produces a non-empty diff (working tree has uncommitted changes, or HEAD is ahead of the chosen base ref).
Do NOT use this skill when:
- The user wants a single codex review pass — use the codex plugin directly.
- The resolved review target produces an empty diff — stop and tell the user.
- The user is drafting a plan from scratch — use
instead.vibe-planning-guard - A review is running as a background or automatic check.
Review Target Modes
The skill supports three review targets, chosen once at Phase 0 and fixed for all three cycles:
— uncommitted changes (tracked-modified + staged + untracked). Codex is invoked withworking-tree
. Claude-side diff commands use--scope working-tree
plusgit diff HEAD --name-only
for untracked files.git ls-files --others --exclude-standard
— HEAD vs. the auto-detected default branch (tries localbranch
/main
/master
first, thentrunk
). Codex is invoked withorigin/*
(frozen SHA resolved from the detected ref at Phase 0). Claude-side diff commands use--base <base_sha>
(triple-dot: merge-base semantics).git diff --name-only <base_sha>...HEAD
— HEAD vs. an explicit ref the user supplies (any commit SHA, tag, or branch name). Codex is invoked withbase-ref
(frozen SHA resolved from the user-supplied ref at Phase 0). Claude-side diff commands use--base <base_sha>
.git diff --name-only <base_sha>...HEAD
The three modes share the same workflow — only the Phase 0 target-resolution step, the codex CLI flags, and the diff command Claude uses for validity checks differ.
Target Kinds
The skill auto-detects
code vs plan from the diff file extensions. See references/focus-text.md for the detection rules. Mixed targets (any code file present) are treated as code; item 6 of references/validity-checklist.md filters out detailed-design findings on markdown files inside a code cycle.
Review Variant Selection
At Phase 0, ask the user once which codex review variant to use for all three cycles:
— codex's native review command. Output is free-form text. Claude manually structures each finding section intoreview
/title
/recommendation
before running the validity check.body
— codex's adversarial review withadversarial-review
. Output is structured--json
. Each element hasfindings[]
,severity
,file
,line_start
,title
,recommendation
. Adversarial cycles also carry abody
block (see §Review Context Format) so codex keeps the same angle across cycles.<review_context>
The choice is fixed for the whole loop. If the user wants to switch variants, they must restart the skill.
Recommendation: use
adversarial-review unless there is a specific reason not to. The review variant is retained for environments where structured JSON output is unavailable or for users who specifically want free-form codex output, but it operates in a minimum-functionality mode: no <review_context> carry across cycles, no proposal-mode DoD (interview only), no V=0 cycle-N+1 override, no rejected-findings forwarding. Expect to get a single-shot review that Claude structures manually, with no adaptation between cycles. For multi-cycle review-and-fix workflows, adversarial-review is strictly the better path.
Workflow
Phase 0 — Preflight (runs once)
-
Verify git repository.
:Bash
— stop if not inside a git repo.git rev-parse --is-inside-work-tree
-
Resolve the review target. Ask the user once, via
, which review scope this cycle should cover. Offer three options:AskUserQuestion
— review the uncommitted diff (tracked-modified + staged + untracked).working-tree
— review HEAD vs. the auto-detected default branch.branch
— review HEAD vs. an explicit ref the user provides.base-ref
After the scope choice:
: no additional input needed.working-tree
: auto-detect the default branch by tryingbranch
,main
,master
as local refs viatrunk
, then falling back togit show-ref --verify --quiet refs/heads/<name>
. If none resolve, stop withorigin/<name>Could not auto-detect a base branch. Re-run with scope = base-ref and supply a ref explicitly.
: ask a follow-up free-formbase-ref
for the ref string. Validate it withAskUserQuestion
— if the command fails, stop withgit rev-parse --verify <ref>Base ref '<ref>' not found in this repository.
Store the result as
. Fully construct the object in Phase 0 so proposal DoD mode in step 7 has the data it needs:review_target
— one ofscope
,working-tree
,branch
.base-ref
—base_ref
fornull
; the resolved ref string forworking-tree
andbranch
(kept as display metadata only).base-ref
—base_sha
fornull
; forworking-tree
/branch
, the immutable commit SHA resolved frombase-ref
at Phase 0 viabase_ref
. All subsequent commands across all 3 cycles (codexgit rev-parse <base_ref>
, diff commands, commit-range enumeration, validity-check scope diff, Part B ownership audit, soft-reset anchor) use--base
, neverbase_sha
, so a mutable ref (e.g.base_ref
,main
) advancing mid-run cannot drift the review target. Iforigin/main
at any later check (user manually updated the ref), print a one-line warningbase_ref != base_sha
and proceed.Base ref '<base_ref>' moved from <base_sha> during the run; continuing against the frozen SHA.
— the exactdiff_command
command Claude will reuse for target-kind detection and validity checks:git diff --name-only …
→working-tree
(paired withgit diff HEAD --name-only
for untracked files)git ls-files --others --exclude-standard
/branch
→base-ref
(triple-dot: merge-base semantics; uses the frozen SHA, not the mutablegit diff --name-only <base_sha>...HEAD
, so target-kind detection and validity checks cannot drift if the named ref advances mid-run)base_ref
— the executed output ofdiff_files
. Fordiff_command
scope, this MUST be the union ofworking-tree
andgit diff HEAD --name-only
(tracked-modified + staged + untracked); omitting untracked files would undercount the actual review surface. Forgit ls-files --others --exclude-standard
/branch
it is just thebase-ref
output.diff_command
— fordiff_numstat
/branch
:base-ref
. Forgit diff --numstat <base_sha>...HEAD
:working-tree
PLUS a synthesized per-untracked-file line count (e.g.git diff --numstat HEAD
on each untracked file, emitted in the samewc -l
shape as numstat so the total LOC calculation is uniform). Omitting untracked line counts — as an earlier draft did — would let an untracked-only working-tree diff (100% new files) report 0 numstat LOC and silently qualify for proposal-mode DoD with no commit messages or patch to ground intent. Used to size the diff for the proposal-mode threshold (≤ ~100 changed LOC).<added>\t<deleted>\t<path>
—commit_range
fornull
;working-tree
(double-dot, using the frozen SHA for commit-delta enumeration) for branch / base-ref. NOTE: diff uses triple-dot (merge-base), commit enumeration uses double-dot (exact commits on HEAD that are not on base). Using<base_sha>..HEAD
(notbase_sha
) keeps enumeration stable against mid-run ref movement.base_ref
—commit_messages[]
for[]
;working-tree
splits for branch / base-ref, trimmed per commit. Derives from the frozen-SHAgit log --format='%s%n%b' <commit_range>
above. Proposal-mode DoD drafting reads these to ground item 1 (intent) and item 4 (out-of-scope) in what the commits actually claim.commit_range
— bounded content-bearing evidence: a handful of representative files shown mostly in full (small untracked files, key tracked hunks), trimmed withdiff_patch_excerpts
when needed. Keep the total roughly on the order of a few KB so the proposal-mode prompt stays manageable. The goal is "enough for Claude to infer intent and out-of-scope boundaries", not byte-exact compliance.[truncated — <M> more lines]
: always synthesize.working-tree
/branch
: omit only when the proposal-mode evidence gate is already satisfied by commit messages (≥20-char subject + non-empty body in at least one commit in scope). If the evidence gate fails on messages alone — squashed / templated / vague commits — synthesize excerpts sourced exclusively from the target commit range (base-ref
output), never from local working-tree state or untracked files, using the same bounded-budget shape asgit diff <base_sha>...HEAD
. If the range cannot yield a usable excerpt (binary-only, no textual diff), fall back toworking-tree
mode. This preserves the existing invariant that DoD drafting for branch/base-ref never anchors on a short squash-commit title AND never leaks out-of-range evidence into the proposal.interview
Proposal-mode evidence gate: even when
totals ≤ 100 LOC, proposal mode requires content-bearing evidence.diff_numstat- For
scope: ifworking-tree
is empty ANDcommit_messages[]
has no non-blank content (e.g. all untracked files are empty or binary, or all tracked-modified hunks collapsed to no patch), fall back todiff_patch_excerpts
mode — filenames and line counts alone cannot draft six DoD items with enough fidelity.interview - For
/branch
scope: commit messages alone are NOT sufficient evidence. Squashed, templated, or vague messages likebase-ref
,"fix review comments"
,"wip"
can pass the LOC threshold while giving proposal mode no usable intent or out-of-scope signal. Require that"update tests"
contain at least one commit with a subject of ≥20 characters AND a non-empty body, OR fall back to populatingcommit_messages[]
for branch/base-ref (same budget-based heuristic as working-tree) and passing it forward. If neither evidence path is available — all commit messages are short/empty and no patch excerpts are synthesized — fall back todiff_patch_excerpts
mode. The risk this gate blocks is a DoD drafted from the title of a squash commit, which then anchorsinterview
decisions for the whole run.reject-out-of-scope
Every cycle reuses the same
so the diff scope stays stable even after fixes are applied.review_target -
Verify the target has a non-empty diff.
:working-tree
must be non-empty. If empty, stop withgit status --porcelainNo working-tree diff to review. The codex-review-cycle skill requires uncommitted changes when scope is working-tree.
/branch
:base-ref
(use the frozen SHA from step 2, not the mutablegit diff --name-only <base_sha>...HEAD
) must be non-empty. If empty, stop withbase_refNo committed changes between <base_ref> (<base_sha>) and HEAD. The codex-review-cycle skill requires a non-empty diff for branch/base-ref scopes.
-
Ensure codex is ready. Invoke
once to confirm the codex CLI is configured. Stop if setup reports a blocking failure.Skill(codex:setup) -
Detect target kind.
- Run
. Forreview_target.diff_command
, also runworking-tree
and union the untracked list with the diff output.git ls-files --others --exclude-standard - Apply the extension rules in
.references/focus-text.md - Record
as eithertarget_kind
orcode
.plan
- Run
-
Ask for review variant (once, via
). Two options:AskUserQuestion
andreview
. Store the choice asadversarial-review
.variant -
Pre-collect DoD (adversarial only) and initialize cycle state. Set
,rejected_ledger = []
,cycle_history = []
.dod = null- If
, collect the six-item Definition of Done now by invoking the four-mode collection flow invariant == adversarial-review
§Collection Modes, passing the fully-constructedskills/review-scope-guard/references/dod-template.md
from Phase 0 step 2 (includingreview_target
,diff_files
,diff_numstat
) as the proposal-mode input contract. Default tocommit_messages[]
; useinterview
whenproposal
totals ≤ ~100 LOC AND commit-messages or patch excerpts provide content-bearing evidence; usereview_target.diff_numstat
when the diff is ≤ ~30 LOC AND the user explicitly said "quick DoD" / "minimal DoD" / similar; usequick
when the user has already pasted a DoD block in the conversation. Iffree-text
is somehow incomplete (defensive check — Phase 0 step 2 should have populated every field), forcereview_target
mode per the scope-guard input contract. Cache the result oninterview
sodod<review_context cycle="1">
can be populated from DoD item 1 before step 8 runs. Pass the cached<intent>
(notdod
) tonull
at step 10a so the scope-triage skill does not re-ask. This solves the cycle-1 dependency wherereview-scope-guard
would otherwise need intent that had not yet been collected.<review_context> - If
, leavevariant == review
here. Native review does not carrydod = null
, so there is no early-intent dependency. Step 10a's first<review_context>
invocation will collect DoD interactively at that point.review-scope-guard
Also record
— this is the anchor for the step 20 soft-reset at termination. Forpre_cycle_1_head = git rev-parse HEAD
scope this value is unused.working-treeSubsequent cycles reuse the cached DoD and pass the running
/rejected_ledger
forward.cycle_history - If
Phase 1 — Review Cycle (repeats up to 3 times; counter N = 1..3
)
N = 1..3-
Run the review. Compute
fromcodex_scope_args
:review_target.scope
→working-tree--scope working-tree
→branch
(frozen SHA from Phase 0; NOT--base <review_target.base_sha>
, which is mutable)base_ref
→base-ref--base <review_target.base_sha>
Cycle-N>1 preflight (
/branch
only): before invoking codex on cycle 2 or 3, verify the state between cycles is as expected. Letbase-ref
(true for normal fix cycles, false for V=0 no-fix retries). Run the following single-pass check:expected_commit = !cycle_history[N-1].no_fix_cycle- HEAD movement: compare
againstgit rev-parse HEAD
.cycle_history[N-1].pre_pause_head- If
is true: HEAD MUST have advanced. If equal, the user never committed — re-issue the step 14 manual-commit instruction.expected_commit - If
is false (V=0 retry): HEAD MUST equal the stored head. If HEAD moved, the user pulled or committed unrelated work during the override pause; halt withexpected_commit⚠️ HEAD changed during the V=0 override pause. Retry cycle would review an expanded target. Restart the skill or revert the changes.
- If
- Working-tree cleanliness:
- When
is true:expected_commit
MUST be empty (path-restricted to the fix set; staged/unstaged remnants of applied fixes block the cycle). Untracked files unrelated to the review_target are exempt.git status --porcelain -- <cycle N-1's touched_files> - When
is false (V=0 retry, no touched_files exists):expected_commit
with no path restriction MUST be empty, excepting untracked files unrelated to the review_target. This is strictly wider than the expected_commit=true check because no commit was made — any change to tracked files during the override pause would expand the review target and invalidate the retry. On failure, halt withgit status --porcelain⚠️ Working tree changed during the V=0 override pause. Retry cycle would review an expanded target. Restart the skill or revert the changes.
- When
- Commit-delta coverage (only when
is true):expected_commit
must be non-empty AND must cover every file in cycle N-1'sgit diff --name-only <pre_pause_head>..HEAD -- <touched files>
list. Any touched file missing from this delta means the user's commit did not include that file. A legitimate fix that reverts a file back to base is still a valid commit delta even though the file disappears fromapplied_fixes[*].touched_files[]
— this variant catches that case because it queries the commit-delta range, not the branch-total range. Skipped entirely for V=0 retries (no commits to audit).<base_sha>...HEAD - Cycle-commit ownership (warn-and-confirm) (only when
is true): compare the full commit-delta path list against cycle N-1'sexpected_commit
. Runtouched_files
(no path restriction) and letgit diff --name-only <pre_pause_head>..HEAD
be that output. Paths incommitted_paths
that are NOT in cycle N-1'scommitted_paths
are unrelated — typically lint autofixes, typo repairs, or adjacent cleanups the user bundled into the cycle commit. Rather than abort (previous behavior, which was hostile totouched_files[]
usage), surface them via a singlegit commit -am
:AskUserQuestion
: "Cycle N-1 commit includes <K> path(s) that Claude did not touch: <full path list>. These will be preserved by the terminal soft-reset and ship in the final squash. Keep them as part of this review's squash, or abort for amend-drop?"question- options:
— record the extras inKeep (continue to cycle N)
for the step-20 Part B audit to surface again at terminal reset. Proceed to cycle N.cycle_history[N-1].unrelated_commit_paths[]Abort to amend
and pause the skill like the manual-commit gate in step 14. Rationale: the hard-abort form of this check rejected normal developer workflows. Warn-and-confirm preserves the signal (user sees unrelated paths per-cycle) without blocking lint-fix-plus-cycle-fix commits. Skipped entirely for V=0 retries.Amend your cycle N-1 commit to drop the unrelated paths, then reply continue.
On any mismatch of the bullets above, do NOT proceed. Print a compact explanation naming the specific check that failed and re-issue the step 14 manual-commit instruction (or the V=0 restart message). Wait for the user to correct the state and reply
. Do not silently review stale state.continueThen:
:variant == review
Capture stdout as free-form text.node "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" review --wait <codex_scope_args>
:variant == adversarial-reviewnode "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" adversarial-review --wait --json <codex_scope_args> "<focus_text_with_context>"
is the target-kind focus text from<focus_text_with_context>
followed by thereferences/focus-text.md
block (see §Review Context Format). Parse stdout as JSON.<review_context>- Parse-retry policy (adversarial only): if JSON parsing fails or any required field is missing (
,findings[]
,severity
,file
,line_start
,title
), retry the exact same call once. A second failure aborts the cycle, surfaces codex's raw stdout verbatim to the user, and ends the skill.recommendation
-
Extract findings and assign IDs
.F1..Fn
: useadversarial-review
as-is.findings[]
: Claude manually slices the free-form output into finding blocks. Each block must have areview
(first line of the block, verbatim), atitle
(the action codex suggests, verbatim), a best-effortrecommendation
, andfile
(resolve from context, leave null if codex did not cite a location). Findings without at least aline_start
and atitle
are dropped with a note in the summary.recommendation
-
Run the validity check silently. For every finding, run the six items in
without echoing the per-item trace to the user. Every item still requires Claude to Read the cited file internally — do not trust codex's body alone — but file reads and item-by-item reasoning are internal only. Assign each finding a three-value outcome:references/validity-checklist.md
,valid
, orpartially-valid
. Record a shortinvalid
(≤20 words) for every finding regardless of outcome — forClaude's note
findings, note the primary reason the finding is grounded (e.g. "confirmed by reading cited lines", "DoD required feature violation"); forvalid
/partially-valid
, note the rejection reason. When multiple findings cite the same file, issue a singleinvalid
call covering the union of cited ranges and reuse the result for every item-2/item-3/item-4 check — do not re-read the same region per finding.ReadExternal-source rule (warning-only): external reads (dependency crate sources, standard library docs, upstream README) are allowed as background evidence for Claude's internal reasoning, but they MUST NOT flip the validity verdict. The verdict is always determined from the review diff itself plus what the finding claims. If an external read contradicts or confirms the finding, record it as
without changing the outcome. The silent-trace rule still holds for validity determined solely from the diff — theClaude's note: background — <source>: <what it showed>
note is only emitted when Claude actually consulted an external source. This rule replaces an earlier "External-source exception" that allowed verdict-flipping with version-pinned sources; in practice Claude cannot reliably pin dependency versions, and the safe constraint is to forbid verdict-flipping entirely.backgroundNo severity-based tiering: item 3 (premise matches artifact) is mandatory for every finding that could become selectable. Read tiering was considered (skip item 3 on medium/low) but rejected: self-consistency between title and recommendation does not prove the artifact actually has the claimed behavior. Skipping item 3 would let invalid medium/low findings reach the user-selection UI, which is exactly the silent-hallucination failure mode the validity check exists to catch. The Read cost (1 Read per unique cited file, shared across findings in that file via the union rule above) is acceptable; tiering's savings do not justify the safety weakening. 10a. Run scope triage via
. Invokereview-scope-guard
passingSkill(review-scope-guard)
(with the validity outcomes already attached), the cachedfindings[]
(pre-collected in step 7 when variant is adversarial; null on cycle 1 for review variant — the skill will collect it interactively then), the runningdod
,rejected_ledger
(for stop-signal evaluation), andcycle_history
(already fully constructed in Phase 0 step 2 — pass it verbatim without re-deriving any field). Phase 0 step 2 guaranteesreview_target
carries the fullreview_target
tuple; step 10a simply forwards it. Do not drop{scope, base_ref, base_sha, diff_command, diff_files, diff_numstat, commit_range, commit_messages[], diff_patch_excerpts}
— scope-guard's proposal-mode evidence gate consumes it for working-tree targets wherediff_patch_excerpts
is empty. The caller MUST passcommit_messages[]
so scope-guard'sreview_target
DoD mode has an authoritative source; without it, scope-guard falls back toproposal
mode (see scope-guard §Inputs). The skill returns a triage verdict per finding (interview
/must-fix
/minimal-hygiene
/reject-out-of-scope
), an updatedreject-noise
, a set of active stop signals, and the collectedrejected_ledger
(on cycle 1). Cache the DoD for later cycles. Store the triage verdicts alongside each finding for step 11. When DoD is missing, the skill still returns classifications inside the 4-category invariant (fall-through lands indod
); render the degraded-mode warning as documented in Failure Modes.minimal-hygiene -
Render the summary. Use the exact table format in §Summary Output Template. Every finding appears in the table, including
andinvalid
ones. Every finding'sreject-*
field is quoted verbatim below the table (per §Summary Output Template). The active stop signals footer is rendered when (a) any signal has statusrecommendation
/ADVISORY
/ACTIVE
, OR (b) any signal isWARNING
. Omit the footer only when every signal is trulynot evaluated: metrics missing
. When the footer renders solely due tosilent
rows, print a compact one-line notice —not evaluated
— instead of the full signal table.Not evaluated (metrics missing): <comma-separated signal names>Structurally-unevaluable compaction: subtract
from thestructurally_unevaluable_signal_names
set before rendering. The structurally-unevaluable names are shown once in cycle 1's footer asnot_evaluated_signal_names
and omitted from cycle 2+ footers entirely. This replaces the previous behavior where_Stop signals unavailable in codex-review-cycle integration: <names> (standalone invocation required for full 5-signal surface)._
/file-bloat
appeared in every cycle'sreactive-testing
list.Not evaluatedAdditionally, starting from cycle 2, compare the current-cycle
(taken fromnot_evaluated_signal_names
's return value received in step 10a of the current cycle — NOT fromreview-scope-guard
, which is only appended later in step 15) againstcycle_history[current]
(the immediately previous cycle, not cycle 1) using the element-wise-equal semantics incycle_history[N-1].not_evaluated_signal_names
§Per-cycle suppression. Comparing against N-1 (not cycle 1) prevents flapping from being masked: a set that differs from cycle 1 → matches cycle 2 → differs from cycle 3 would otherwise be silently suppressed if only the cycle-1 baseline were checked. This ordering is required because step 11 runs before step 15 persists the current cycle's entry; readingreview-scope-guard/references/stop-signals.md
at step 11 would read stale or empty state. If the two lists are equal, printcycle_history[current]
instead of re-listing the names. If they differ, re-render the full list AND add_Not evaluated: unchanged from cycle N-1 — see cycle N-1 summary for signal list._
so the change is visible. The canonical order guarantees ordering-only differences cannot occur; guard for them anyway._Not evaluated delta vs cycle N-1: added=<names>, removed=<names>._Validity fleet-rate check (plan targets only, ≥5 findings): if the current cycle has ≥5 findings and 100% are classified
, print a single-line calibration warning at the bottom of the summary:valid
This is a soft prompt, not a hard gate — the cycle proceeds normally. Raised threshold (was ≥3) and plan-only scope prevent false alarms on small focused diffs, where 3 valid findings is a normal outcome.⚠️ 100% valid rate with ≥5 findings is unusual for adversarial-review on plan targets. Re-scan for: (1) vague recommendations that should be 'partially-valid: vague', (2) already-handled premise that should be 'invalid: misread', (3) design-intent reversals that should route through scope triage as 'reject-out-of-scope' instead of being accepted as must-fix. -
Zero-valid check. Let
be the count of findings whose validity outcome isV
orvalid
and whose scope category ispartially-valid
ormust-fix
.minimal-hygiene
andreject-out-of-scope
findings are never counted as selectable, even if their validity outcome wasreject-noise
. Ifvalid
:V == 0- If
(final cycle), jump to Phase 2 Case A unconditionally — the cap has fired.N == 3 - If
(native), jump to Phase 2 Case A unconditionally. The V=0 override is not available for native review because the nativevariant == review
command accepts neither a focus-text argument nor areview
block — there is no channel to deliver an<review_context>
instruction. Re-running the same command against the same diff would be a hidden no-op that still consumes one of the 3 cycles. The override is therefore scoped to<angle_request>
.variant == adversarial-review - If
andN < 3
, issue a singlevariant == adversarial-review
before terminating:AskUserQuestion
: "No selectable findings this cycle. Terminate the review, or run one more cycle with a different angle request?"question- options (translate to user language per §Language):
— proceed to Phase 2 Case A.Terminate now (Case A)
— proceed to the no-fix persist step below, then re-enter step 8 withRun cycle N+1
. The next cycle'sN = N + 1
carries a one-line<review_context>
element:<angle_request>
inserted between<angle_request>Prior cycle produced 0 selectable findings. Try a materially different angle — e.g. a deeper root-cause pass, a different subsystem emphasis, or a scope that cuts across files not yet reviewed.</angle_request>
and<previous_fixes>
.<rejected_findings>
No-fix cycle-history persist (before re-entering step 8): because V=0 means no selection and no fix phase, step 15's normal persistence never runs. Without explicit persistence here, the next cycle's
and<review_context>
reference would be stale or empty. Before returning to step 8, append an entry tocycle_history[1]
for the just-completed cycle with the following shape:cycle_history
(empty — no fix phase)applied_fixes: []
(empty — no selection UI was opened)user_declined: []
(empty)skipped_for_scope: []
— populated from the current cycle's validity check (findings whose validity wasclaude_invalid: []
). This carries forward into the next cycle'sinvalid
via the normal union with the ledger.<rejected_findings>
: the current-cycle return value fromnot_evaluated_signal_names
step 11 (same value used for the step 11 footer comparison)review-scope-guard
:pre_pause_head
fornull
;working-tree
otherwise (no branch/base-ref pause occurs on V=0 since there were no fixes to commit)git rev-parse HEAD
— explicit marker that this cycle had no fix phase. The cycle-N>1 preflight in step 8 consumes this marker to setno_fix_cycle: true
: HEAD is required to be UNCHANGED (expected_commit = false
), full working tree required clean (no path restriction), commit-delta and ownership checks skipped. See step 8 §Cycle-N>1 preflight for the full unified rule.HEAD == pre_pause_head
Also persist the current
(whichrejected_ledger
already updated) for the next cycle's forwarding. This persistence is cheap (all buckets are empty exceptreview-scope-guard
andclaude_invalid
) but required fornot_evaluated_signal_names
correctness.<review_context>The override path is bounded by the existing 3-cycle cap: requesting cycle N+1 from a V=0 state still consumes one of the 3 cycles. The user cannot escape the cap this way.
- If
-
Ask the user which findings to fix. Use
per §User Selection UI. Only findings with scopeAskUserQuestion(multiSelect: true)
ormust-fix
appear as options (further filtered by validity to excludeminimal-hygiene
).invalid
andreject-out-of-scope
findings are never offered for selection — they live in the summary table for audit trail only. Always append a finalreject-noise
option. 13.5. Fix-weight precheck (self-discipline gate). Before applying any selected finding, verify that the planned edit matches the finding's scope classification. This check runs silently — it adds no user-visible output unless a mismatch is detected.None — skip all, end cycle
allows multi-line edits, new sections, flow changes, and cross-file edits within the review diff.must-fix
allows only 1-line edits, a single short paragraph addition, or a 1-sentence rule insertion. Edits that exceed this envelope indicate the finding should have been classifiedminimal-hygiene
, not hygiene, and the rest of the workflow would miscount it.must-fix- On mismatch (a
finding whose planned fix exceeds the hygiene envelope): either (a) simplify the edit to hygiene-scope and apply, or (b) raise anminimal-hygiene
asking the user whether to re-classify the finding asAskUserQuestion
before proceeding. Do not silently apply a must-fix-weight edit to a minimal-hygiene finding.must-fix
findings must not trigger any edit — skip entirely.reject-*- Rationale: without this gate,
findings can receive multi-line structural edits, recreating the over-engineering pattern the skill is designed to prevent. This gate forces the classification and the applied weight to match.minimal-hygiene
-
Apply fixes. For each selected finding, Claude reads the cited lines, applies the fix via
orEdit
, and reports the resultingWrite
for the touched files. No sync-sweep, no rescue delegation.git diffWrite-scope boundary: Claude edits only files present in
output (plus untracked files forreview_target.diff_command
scope). If a finding's fix genuinely needs an out-of-diff file, skip the finding with a noteworking-tree
. Out-of-diff writes are a scope expansion that must go through a separate skill invocation, not through the user-selection UI.Skipped: requires out-of-diff writeHow the fixes become visible to the next cycle depends on
:review_target.scope
: fixes are left in the working tree. Cycle N+1'sworking-tree
review sees the staged + unstaged + untracked state directly. No commit is needed.--scope working-tree
/branch
: codex's branch diff is computed asbase-ref
, so in-place edits are invisible until they land in a commit on HEAD. Claude does not commit on the user's behalf. Before printing the manual-commit instruction, record<merge-base>..HEAD
intopre_pause_head = git rev-parse HEAD
— the next cycle's preflight uses this anchor (plus the per-fixcycle_history[current].pre_pause_head
list from step 15) to verify the user's actual commit delta, not just worktree cleanliness. Then, after all selected fixes are applied this cycle, print a manual-commit instruction and pause the skill:touched_files[]
The user owns pre-commit hook outcomes, clean-index concerns, and rollback. If the user repliesCycle N fixes applied to working tree. Branch/base-ref scopes require you to commit these changes before cycle N+1 can see them. Recommended commands: git add <touched files> git commit -m "review cycle N fixes" After committing, reply `continue` to proceed to cycle N+1. Reply `stop` to end the skill here.
, end the skill in Case B-like state (applied fixes remain uncommitted in the working tree; the user can deal with them however they like). If the user repliesstop
, proceed to step 8's cycle-N>1 preflight which verifiescontinue
has moved.git rev-parse HEAD
Sibling-doc cascade check: when a fix changes a user-facing contract of the skill (adds a new side effect the skill did not previously have, changes a stated invariant, introduces a step that sibling docs describe as absent), Claude must in the same edit pass grep sibling docs (
, other SKILL.md sections, CHANGELOG entries for the current release) for claims describing the OLD behavior, and update every match. Specifically runREADME.md
for at least one phrase, and either edit every hit or leave an explicit NOTE comment explaining why a mismatch is acceptable. Rationale: catching contract-breaking fixes in the same edit pass prevents silent contract breaks that would only surface in a later cycle.rg -n '<characteristic phrase from old behavior>' . -
Update cycle history and ledger. Append to
an entry for this cycle recording:cycle_history
— each entry recordsapplied_fixes[]
.{fingerprint, title, file, line_start, scope_category, touched_files[]}
is the stablefingerprint
tuple used by step 17's residual matcher.{normalized_title, file, line_start, scope_category}
is the exact list of files Claude edited while applying the finding — the preflight in step 8 consumes this list to verify those files are visible in cycle N+1's branch diff.touched_files[]
— each entry recordsuser_declined[]
for{fingerprint, title, file, line_start, scope_category}
/must-fix
findings the user did not select (including theminimal-hygiene
case).None — skip all
— each entry recordsskipped_for_scope[]
for findings the user selected but Claude skipped because their fix required an out-of-diff write (see step 14 Write-scope boundary). These count as unresolved at termination time — Case A lists them alongside user-declined carry-overs and must not claim clean resolution while the bucket is non-empty.{fingerprint, title, file, line_start, scope_category, reason}
— each entry recordsclaude_invalid[]
for{fingerprint, title, file, line_start, rejection_reason}
findings from the validity check.invalid
— the ordered string array returned bynot_evaluated_signal_names[]
step 11. Stored verbatim, no mutation. Used by step 11's footer rendering in cycle N+1 to decide whether to suppress thereview-scope-guard
footnote.not evaluated
— optional, populated only when the user choseunrelated_commit_paths[]
at the cycle-N>1 ownership gate. Lists paths from the cycle commit that were NOT inKeep
. The step-20 Part B terminal audit consumes this list to display the unrelated paths one more time before the final squash, so the user can decide anew whether to include them in the final commit.applied_fixes[*].touched_files[]
All four buckets carry fingerprints so step 17's residual accounting matches on the stable
tuple, not on title alone.{normalized_title, file, line_start, scope_category}The
returned by step 10a is already updated withrejected_ledger
andreject-out-of-scope
entries; persist it as-is for the next cycle. The next cycle'sreject-noise<review_context>
block is populated from the union of ledger entries and<rejected_findings>
only — not fromclaude_invalid
oruser_declined[]
. Declines and out-of-diff skips are deferrals, not rejections: leaving them out ofskipped_for_scope[]
lets codex freely re-raise the same findings next cycle so the user can reconsider them. Termination-time accounting still tracks them as unresolved residuals (see step 17 Case A).<rejected_findings> -
Loop check.
: setN < 3
, return to step 8.N = N + 1
: always jump to Phase 2 Case A. The Case A routing internally chooses between the clean-termination variant and the residual-carried-forward variant based on whetherN == 3
+cycle_history[*].user_declined[]
leave any unresolved residuals (see step 17). Final-cycle user declines are handled by the residual variant, not by Case B — the user explicitly dispositioned each finding through the selection UI, which is an active close-out, not a cap failure.cycle_history[*].skipped_for_scope[]- Case B is reserved for an explicit cap-stop condition where the cycle could not run the user-selection UI to completion (e.g. the user interrupted mid-paging during an overflow batch, or the skill aborted before step 13). Normal 3-cycle completion with some user declines is Case A residual, not Case B.
Phase 2 — Termination
- Case A — normal termination. Compute the full residual set: scan
andcycle_history[*].user_declined[]
across all prior cycles. For each, compute a stable fingerprintcycle_history[*].skipped_for_scope[]
(same format as{normalized_title, file, line_start, scope_category}
's ledger fingerprint — reuse that rule). A residual is "carried" if no later cycle'sreview-scope-guard
contains an entry with a matching fingerprint. Matching on title alone is forbidden because generic adversarial titles collide across unrelated findings and could silently clear a residual. If the carried residual set is empty, printapplied_fixes[]
— the clean-termination variant. Otherwise printAll findings resolved after N cycle(s).
(never the "resolved" line) followed byReview cycle terminated after N cycle(s) with residuals carried forward.
andUser-declined valid findings carried to termination:
lists, with each entry showingOut-of-diff skipped findings carried to termination:
so the user can audit. Either way, also print the mandatory<title> (<file>:<line_start>, declined in cycle N)
warning and the per-cycle applied fixes summary.⚠️ No automated verification was run - Case B — cap reached. Print the template in §Termination Criteria Case B. Do not automatically start a fourth cycle. Tell the user they can re-invoke the skill to run another 3-cycle pass.
Review Context Format
Used only when
variant == adversarial-review. The block is appended to the focus text argument with a single blank line between the two sections:
<review_context cycle="N"> <intent><![CDATA[<one-sentence change intent from Phase 0 step 7 DoD item 1>]]></intent> <previous_fixes> <fix cycle="N-1"><![CDATA[<applied finding title + one-line change summary>]]></fix> </previous_fixes> <angle_request><![CDATA[<one sentence; present only when V=0 override fired in the previous cycle>]]></angle_request> <rejected_findings> <rejected cycle="N-1" reason="invalid: file not in diff"><![CDATA[<finding title>]]></rejected> <rejected cycle="N-1" reason="reject-out-of-scope: DoD explicit out-of-scope"><![CDATA[<finding title>]]></rejected> </rejected_findings> </review_context>
element (optional): present only when the previous cycle terminated at step 12 V=0 and the user chose<angle_request>
. Contains a single sentence asking codex to try a different angle. Omit the element entirely when absent.Run cycle N+1
Template note: this block never carries user-declined findings. A user decline is a deferral — codex should remain free to re-raise the same finding next cycle so the user can reconsider. If a template reader is tempted to add a
<rejected reason="user declined"> element, stop: that would let declined valid findings disappear from subsequent cycles and make Case A falsely claim resolution.
Rules:
- Cycle 1 carries
(populated from Phase 0 step 7 DoD item 1 pre-collection);<intent>
and<previous_fixes>
are empty.<rejected_findings>
is preceded by this literal instruction, on its own line:<review_context>Do not re-report findings in <rejected_findings> unless you have a materially different angle.- Every user-facing string inside
is quoted as-is. No JSON encoding. No HTML entity escaping. The CDATA wrapper keeps any<!-- CDATA -->
,<
,>
in codex output from terminating the block.& - This skill does not use a separate skip ledger.
is the only cross-cycle carry.<review_context>
window: the block carries only the immediately prior cycle (N-1), not a cumulative history. Cycle 3's<previous_fixes>
contains the 5 fixes from cycle 2; it does NOT also enumerate cycle 1's fixes. Each<review_context>
element uses the compact form<fix>
— summaries longer than 40 words are forbidden. Codex only needs the latest ground truth for cross-cycle suppression; older history would inflate the context block without improving review quality. V=0 exception: when<fix cycle="N-1" category="must-fix|minimal-hygiene"><![CDATA[<title>: <≤40 word summary>]]></fix>
(prior cycle was a V=0 override retry and emitted no fixes), cycle N'scycle_history[N-1].no_fix_cycle == true
skips the empty cycle N-1 and carries fixes from cycle N-2 instead. Without this exception, codex would lose context of cycle 1's applied fixes when cycle 2 was V=0 no-fix, causing re-surfacing of already-fixed findings in cycle 3.<previous_fixes>
sources: the block aggregates two kinds of prior-cycle rejections — (1) entries in the<rejected_findings>
returned byrejected_ledger
(scope-triage rejections:review-scope-guard
/reject-out-of-scope
), and (2)reject-noise
from the prior cycle's validity check. Each rejection renders as its ownclaude_invalid[]
element with the<rejected>
attribute carrying the original category and rationale (e.g.reason
,reason="reject-out-of-scope: DoD explicit out-of-scope"
). Ledger entries withreason="invalid: file not in diff"
render with an extra hint:count >= 2
so codex sees how persistent the complaint is. User-declined findings are NOT included — a decline is a deferral, not a rejection, and codex is free to re-raise the same finding in the next cycle so the user can reconsider it.reason="reject-noise: already-rejected (count=N)"
Validity Check Summary
Full details live in
references/validity-checklist.md. The six items are:
- File exists in the diff —
appears in the output offinding.file
(plusreview_target.diff_command
whengit ls-files --others --exclude-standard
).review_target.scope == working-tree - Line range exists —
is within the current file length; flag shifted ranges asfinding.line_start
.partially-valid - Premise matches artifact — Claude reads the cited lines and confirms codex's assertion.
- Scope —
overlaps a changed hunk in the scope-appropriate diff (line_start..line_end
for working-tree;git diff HEAD -- <file>
for branch / base-ref, using the frozen Phase-0 SHA), not unchanged code in a touched file.git diff <base_sha>...HEAD -- <file> - Recommendation concreteness — a specific failure mode is named, not a vague "consider…".
- Target-kind consistency — plan cycles reject detailed-design nitpicks on
/.md
/.markdown
files..txt
Outcome:
valid (all pass), partially-valid (items 2 or 5 returned partially-valid, no invalid), invalid (any of items 1, 3, 4, 6 returned invalid).
Summary Output Template
Language reinforcement: the template below uses English for readability of the SKILL.md spec itself. When rendering actual output, translate ALL non-verbatim elements to the user's language per §Language: section headers, column headers (except
Title (codex verbatim)), Claude's note content, Recommended action values, the recommendation block heading, stop-signal footer text, and termination messages. Only codex's title and recommendation fields stay in their original language (they are contractually verbatim).
Render after every cycle, before the user selection prompt:
### Cycle N review summary (variant: <review|adversarial-review>, target: <code|plan>) | ID | Severity | File:Line | Title (codex verbatim) | Validity | Scope | Claude's note | Recommended action | |----|----------|-----------|------------------------|----------|-------|---------------|--------------------| | F1 | high | src/auth/login.ts:42 | Missing null check on userId | valid | must-fix | DoD required features; core correctness | Apply fix | | F2 | medium | src/api/user.ts:88 | Consider adding retry logic | partially-valid | reject-noise | vague, no concrete failure mode | Skip | | F3 | low | docs/plan.md:15 | Rename process to handler | invalid | reject-noise | detailed-design on plan target | Skip | | F4 | medium | src/curl.rs:130 | --url-query value leaks to URL | valid | minimal-hygiene | value consume + warn; semantics NOT implemented | Apply 1-line hygiene | | F5 | medium | src/curl.rs:120 | Implement --json shorthand body | valid | reject-out-of-scope | DoD explicit out-of-scope: cURL 7.82+ new | Skip (ledger fwd) | **Recommendation (per finding)**: - **F1**: <codex recommendation verbatim> - **F2**: <codex recommendation verbatim> ... Quote every finding's `recommendation` field verbatim below the table. Do not skip quoting even when the title seems to imply the recommendation — the user needs the full recommendation text to make an informed fix/decline decision without reading the raw codex JSON. **Active stop signals** (footer rendered when ≥1 signal is `ADVISORY`/`ACTIVE`/`WARNING` **or** `not evaluated: metrics missing`; omit entirely only when all signals are truly `silent`. When only `not evaluated` rows exist, replace the full table with a compact one-liner `Not evaluated (metrics missing): <names>`): | Signal | Status | Evidence | |--------|--------|----------| | ... | ... | ... |
Format rules that protect finding intent
- The
column must contain codex'sTitle (codex verbatim)
field exactly. No paraphrase, no shortening, no translation.title - The
block must contain each finding's fullRecommendation (per finding)
field verbatim, regardless of length. Never truncate, summarize, or abbreviate — the user needs the complete remediation text to make an informed fix/decline decision.recommendation - Claude's interpretation lives only in the
column and theClaude's note
column. Do not edit any other column based on what Claude thinks the finding "really means".Recommended action - If Claude judges a finding
, the row still appears in the table with the original title and recommendation. Theinvalid
column then carriesClaude's note
.invalid because <reason> - If
classifies a finding asreview-scope-guard
orreject-out-of-scope
, the row still appears in the table for audit. Thereject-noise
column carries the category andScope
carries the triage rationale verbatim from the skill's output.Claude's note - Severity values come from codex. Do not upgrade or downgrade severity based on Claude's validity or scope verdict.
User Selection UI
Language reinforcement:
AskUserQuestion question, header, and option label/description fields must be in the user's language per §Language. Codex verbatim titles embedded in labels stay in their original language.
Use
AskUserQuestion with multiSelect: true. Only findings whose scope is must-fix or minimal-hygiene AND whose validity is valid or partially-valid appear as options. invalid, reject-out-of-scope, and reject-noise findings are never selectable — the user sees them in the summary table above for audit trail only.
minimal-hygiene options include a (hygiene) marker in the label so the user knows the expected fix is 1-line value consume + warn, not a full implementation.
Base layout. Token rule: each option's
description field must carry only the finding's file:line — nothing else. The label already encodes the title, severity, and scope; the summary table above already carries rationale and Claude's note. Repeating any of that in the description is wasted context.
question: "Which findings should I address in cycle N?" header: "Cycle N fixes" multiSelect: true options: - { label: "F1: Missing null check on userId (high, must-fix)", description: "src/auth/login.ts:42" } - { label: "F4: --url-query value leaks to URL (medium, hygiene)", description: "src/curl.rs:130" } - { label: "None — skip all, end cycle", description: "End this cycle" }
Overflow handling (more than 3 selectable findings per severity)
AskUserQuestion accepts maximum 4 options per question; reserve one for None — end cycle, leaving 3 finding slots per question. When a severity bucket has more than 3 selectable findings, issue multiple sequential AskUserQuestion calls (3 findings each) in severity order until every selectable finding has been surfaced. No finding may be silently deferred just because it did not fit on a page — the fix phase does not begin until every selectable finding has been shown to the user and either applied or declined.
Termination Criteria
Language reinforcement: the templates below are in English for spec readability. Actual output must be in the user's language per §Language. Translate all headings, messages, and labels; keep codex verbatim titles and technical identifiers (
must-fix, minimal-hygiene, file paths) as-is.
Case A — V == 0 (normal termination):
When the residual set (carried user-declined + carried out-of-diff skipped) is empty:
All findings resolved after N cycle(s). ⚠️ No automated verification was run. This skill never executes tests, lints, builds, or any verification command on behalf of the user. The "resolved" claim only means "codex returned zero selectable findings this cycle and no residuals were carried from prior cycles". Before shipping, review the applied diff and run your own verification (test suite, type check, lint, build, manual smoke) as appropriate for the change. Applied fixes by cycle: - Cycle 1: <list of finding titles or "none"> - Cycle 2: <list or "none"> - Cycle 3: <list or "none">
When any residuals exist (declined carry-overs, out-of-diff skips, or final-cycle declines), swap the opening line and list the residuals — do NOT print "All findings resolved":
Review cycle terminated after N cycle(s) with residuals carried forward. ⚠️ No automated verification was run. See the clean-termination variant above for rationale. Applied fixes by cycle: - Cycle 1: <list of finding titles or "none"> - Cycle 2: <list or "none"> - Cycle 3: <list or "none"> User-declined valid findings carried to termination: <titles from cycle_history[*].user_declined[] that never appear in a later cycle's applied_fixes[], or "none"> Out-of-diff skipped findings carried to termination: <titles from cycle_history[*].skipped_for_scope[] that never appear in a later cycle's applied_fixes[], or "none">
Case B — 3 cycles complete with unresolved valid findings:
## Review cycle terminated — cap reached - Cycles run: 3 / 3 - Findings applied: <count> - Findings still valid and unresolved at cap: <count> ⚠️ No automated verification was run on the applied fixes — see Case A for rationale. ### Unresolved valid findings <Summary Output Template table, filtered to valid/partially-valid findings that were never applied> ### Next steps - Re-run `codex-review-cycle` after further work, or - Address the unresolved findings manually, or - Explicitly accept them as known residuals.
The skill never advances to a fourth cycle. The user must invoke the skill again to continue.
-
Review assessment. After printing Case A or Case B output, render a concise review assessment block in the user's language (per §Language) to help the user decide whether to re-invoke the skill or move on:
## Review assessment **Trend**: <1 sentence — e.g. "converging (5 → 4 → 3, severity shift from high to medium)", "stable (structural gaps in each cycle)", "cascading (cycle N fixes created cycle N+1 findings)"> **Character**: <1 sentence — e.g. "mostly state-model gaps", "edge cases and design-philosophy arguments", "doc/wording consistency issues"> **Clusters** (optional — render only when ≥2 **rejected-ledger** entries share a `cluster_id`): `<cluster_id>`: <N> ledger entries across <M> cycle(s) (see ledger entries L<i>, L<j>, ...). Emit at most 3 cluster lines, sorted by finding count descending. If no cluster has ≥2 members, omit the line entirely. **Scope limitation**: cluster accounting is intentionally limited to rejected-ledger entries because only those carry `cluster_id` (see `review-scope-guard` Phase 3 step 9 assignment rule). Applied-fix findings do not participate in cluster summary; extending the carrier to applied fixes is deliberately deferred to avoid inconsistent partial counts. **Recommendation**: <"continue reviewing" | "stop and audit scope" | "move to next work" with 1-sentence rationale. Determined from recorded state only: - If any `must-fix` or `minimal-hygiene` residual was carried to termination → "address residuals before shipping" - If any stop signal is `ACTIVE` or `WARNING` → "stop and audit scope" (aligns with review-scope-guard's stop-signal contract: ACTIVE/WARNING means diminishing returns or scope drift, not a reason to run more cycles) - If clean termination (no residuals) AND finding count decreased across cycles AND no stop signal tripped → "move to next work" - Otherwise → "continue reviewing" (default-safe)> **Suggested next action**: <concrete 1-line action — e.g. "squash and merge to main", "run 1-cycle working-tree dogfood on the applied fixes", "address the 2 carried residuals manually before merging">This block is advisory — it does not gate any action. Keep each part to one sentence; do not re-list findings or repeat the termination summary.
-
Soft-reset temporary cycle commits (
/branch
only). During the review run, the user created one commit per cycle at Claude's request (step 14 manual-commit pause). These are intermediate review-cycle artifacts, not the user's intended final commit. To keep the applied code changes while removing the intermediate commit history:base-ref-
If
or no cycle commits were created, skip this step silently.review_target.scope == working-tree -
Terminal-cycle verification: before resetting, verify the final cycle's applied fixes were actually committed. Run
. If any files have uncommitted changes, printgit status --porcelain -- <final cycle's touched_files>
and skip the reset with a manual-squash fallback:⚠️ Final cycle has uncommitted applied fixes (<file list>). Soft-reset will NOT stage these — only committed changes become staged after reset. Commit them first, or they will be lost from the staged state.
.git reset --soft <pre_cycle_1_head> -
Retrieve
from Phase 0 step 7 and record the currentpre_cycle_1_head
asHEAD
.final_head -
Dirty-state audit (pre-preview): before preview, confirm no non-cycle-owned state would be staged by the reset. Compute
= union ofcycle_owned_files
, then:cycle_history[*].applied_fixes[*].touched_files[]- Part A (uncommitted): run
and inspect every entry:git status --porcelain- Entries that refer to files outside
are unrelated uncommitted state that would survivecycle_owned_files
.git reset --soft - Entries that refer to files inside
are also blocking unless they are the final cycle's applied-fix files that the Terminal-cycle verification above already cleared. Any staged/unstaged edit on an earlier-cycle cycle-owned file (or on a final-cycle file that the Terminal verification failed on) bypassescycle_owned_files
— soft-reset preserves the index and working tree but will NOT stage an unstaged edit, so the workflow's "all applied fixes are staged" claim becomes false. Surface every such entry in the abort output below, including staged vs unstaged status, so the user can commit/stash before re-running.git reset --soft
- Entries that refer to files outside
- Part B (committed-range ownership): run
and compare againstgit diff --name-only <pre_cycle_1_head>..<final_head>
. Any path in the committed delta that is NOT incycle_owned_files
is an unrelated file the user accidentally included in a cycle commit;cycle_owned_files
will stage it into the final squash without the preview flagging it (the preview only showsgit reset --soft
, which lists filenames but does not cross-check ownership). Part B catches what Part A cannot: unrelated work already committed into the cycle range. Paths present in--stat
(user-approved during cycle-N>1 ownership gate) are NOT treated as abort-worthy at Part B — they already got a user decision. Part B surfaces them in the preview output with acycle_history[*].unrelated_commit_paths[]
tag so the final squash commit accurately reflects what is being shipped.(user-approved unrelated) - If either Part A or Part B reports entries outside
, print the following and abort the soft-reset entirely:cycle_owned_files⚠️ State outside the cycle-owned files detected. `git reset --soft <pre_cycle_1_head>` would preserve this into the final staged index, mixing unrelated work into what looks like a "cycle-only" squash. State leaking into the soft-reset target: <list every entry with its source tag: [A-uncommitted] / [A-dirty-owned] / [B-committed] — one per line; if all three categories are empty this entire abort block is not printed> Resolve before continuing: - [A-uncommitted] paths: commit, stash, or clean them. - [A-dirty-owned] paths: commit or clean the staged/unstaged remnants on cycle-owned files. - [B-committed] paths: amend the offending cycle commit to drop the unrelated file, OR rewrite the commit range to exclude it, OR explicitly include them in a manual post-reset commit by running `git reset --soft <pre_cycle_1_head>` yourself after clearing [A-*]. Re-invoke the skill after the range is cycle-clean. - Stop the skill here (do NOT proceed to preview or reset) until the user fixes the state and re-invokes. The soft-reset preview gate gives false confidence if the preview shows only the cycle diff while the actual post-reset staged index would contain unrelated work (uncommitted OR committed).
- Part A (uncommitted): run
-
Soft-reset preview (confirmation gate): only reached when the dirty-state audit passes. Before running
, show the user what will be squashed. Rungit reset --soft
to list the cycle commits, andgit log --oneline <pre_cycle_1_head>..<final_head>
to show the cumulative change. Print:git diff --stat <pre_cycle_1_head>..<final_head>About to soft-reset <N> cycle commit(s) onto <pre_cycle_1_head>. Why squash: the cycle commits (`review cycle 1 fixes`, `review cycle 2 fixes`, ...) are intermediate artifacts of the review loop. Most users want a single final commit that represents the applied change; soft-reset preserves every line of every cycle fix in the staged index, so you can write the final commit message yourself. Declining keeps the cycle commits in place if you prefer their granular history. After reset, all changes below are staged in the index (working tree unchanged). You create your own commit message. Commits to be collapsed: <output of git log --oneline ...> Cumulative change (will be staged): <output of git diff --stat ...>Approved-unrelated paths notice (only when
is non-empty): before the main reset prompt, display an informational section:cycle_history[*].unrelated_commit_paths[]Approved unrelated paths from earlier cycles (user-tagged during cycle-N>1 ownership gate): - <path> (approved in cycle N) - ... These files are NOT in any applied fix's touched_files. They were bundled into cycle commits and will be preserved by the soft-reset into the final staged index. If you changed your mind about including them, decline the next prompt and amend the cycle commits manually.This is a display-only notice, not an AskUserQuestion — the user already tagged these paths as approved during the cycle-N>1 ownership gate. The main reset prompt below is the opportunity to back out.
Then issue the main reset
:AskUserQuestion
: "Collapse these N cycle commit(s) into a staged index, ready for your commit?"question- options:
— proceed to the next bullet.Yes, soft-reset now
— skip the reset; printNo, leave cycle commits as-is
git reset --soft <pre_cycle_1_head>Cycle commits left in place. Squash manually with
and end the skill.if desired.
-
Run
. This removes all intermediate cycle commits from HEAD but leaves the accumulated changes staged in the index. The user's working tree is unchanged.git reset --soft <pre_cycle_1_head> -
Print:
Soft-reset: <N> temporary cycle commit(s) (<pre_cycle_1_head>..<final_head>) removed. All applied fixes are staged in the index. Create your own commit: git commit -m "<your message>"
-
Preconditions Recap
- Git CLI available on
.PATH - Current working directory inside a git repository.
- The chosen review target produces a non-empty diff: either uncommitted changes exist (
), or HEAD is ahead of the auto-detected default branch (working-tree
), or the user-supplied ref exists andbranch
is non-empty (<ref>...HEAD
).base-ref - Codex plugin installed and
reports a ready state.Skill(codex:setup)
skill available (invoked at step 10a for scope triage and DoD collection).review-scope-guard- Both
andcodex-review-cycle
are registered with the Claude Code harness. If not (e.g. during local development before marketplace publication), follow the SKILL.md steps manually — every step is self-contained.review-scope-guard
Failure Modes
- Codex CLI missing or setup incomplete — stop in Phase 0 step 4. Tell the user to install the codex plugin or run
./codex:setup - Default branch not detected (scope =
) — stop in Phase 0 step 2 with guidance to re-run with scope =branch
and an explicit ref.base-ref - User-supplied ref not found (scope =
) — stop in Phase 0 step 2 withbase-refBase ref '<ref>' not found in this repository. - JSON parse failure (adversarial) — retry once; a second failure aborts the cycle with codex's raw stdout surfaced verbatim.
- File cited by codex no longer exists — item 1 of the validity check returns
. The finding is listed in the summary but not selectable.invalid: file not in diff - User has no working-tree diff after a cycle's fixes are applied (scope =
) — continue to the next cycle anyway (the next review will see the committed state). Do not silently skip cycles. Forworking-tree
/branch
scopes the diff is against a committed base, so in-cycle fixes never empty the diff.base-ref - User declines every finding across all 3 cycles — terminate in Case A with the user-declined message, not Case B. The cap did not fire; the user actively closed the loop.
- User declines the DoD interview in cycle 1 step 7 (adversarial) or step 10a (review) —
stays inside the 4-category invariant: fall-through findings still classify asreview-scope-guard
, and ledger/vague findings still classify asminimal-hygiene
. No 5threject-noise
bucket is created. The summary table footer printsunclassified
The user is the last line of defense in this degraded mode.⚠️ DoD not collected — scope triage degraded. Review each selectable finding manually before applying; the minimal-hygiene fall-through is weaker than a DoD-anchored classification. - Stop signal
orACTIVE
during cycles 1-2 — print the recommendation in the summary but do not auto-stop. The cycle cap still governs termination.WARNING - User chooses
from a V=0 state but codex again returns 0 selectable findings — the next V=0 offer is still issued per step 12 (adversarial-review variant only); the user can choose to terminate or burn another cycle. The cap still governs. Do not suppress the offer just because it fired before.Run cycle N+1 - V=0 fires under
— the override path is unavailable; skip directly to Phase 2 Case A as documented in step 12. The summary row for the cycle still renders, and the finalvariant == review
should note "V=0 under native review — override disabled, see step 12" so the user understands why no cycle N+1 offer appeared.Review assessment
entry is internally inconsistent — corruption is defined by same-entry contradiction, not by comparison with earlier cycles. A valid applied-then-V=0-retry sequence (no_fix_cycle: true
→cycle 1: applied_fixes non-empty
→cycle 2: no_fix_cycle=true, applied_fixes=[]
) must be honored — cycle 2 having a no-fix marker while cycle 1 had fixes is NORMAL. Treat the marker as corrupted ONLY when the same entry that carriescycle 3: uses cycle-2 marker to exempt preflight
also has non-emptyno_fix_cycle: true
,applied_fixes[]
, oruser_declined[]
. In that (truly contradictory) case, printskipped_for_scope[]
and run the full preflight ignoring the marker. This is defense-in-depth against a corrupted state writer; normal applied-then-V=0 flow is untouched.⚠️ Inconsistent no_fix_cycle marker on cycle N-1 (marker true but the same entry has applied/declined/skipped entries). Running full preflight.- Conversation context is lost mid-run (e.g. compaction, tab close, long idle) — the skill's state (cycle_history, rejected_ledger, review_target, dod) lives only in the active conversation. If context is truncated or the session resets, the in-flight run CANNOT be resumed automatically. Recovery steps: (1) if any cycle commits exist on
/branch
scope, the user may squash them manually withbase-ref
fromgit reset --soft <pre_cycle_1_head>
; (2) if applied fixes sit uncommitted ongit reflog
scope, they stay in place and the user commits normally; (3) restart the skill from Phase 0 on the current state — the new run does NOT know about prior cycles' rejected_ledger, so codex may re-raise findings that the earlier run rejected as noise. State persistence across session breaks is deferred to a separate plan; this bullet documents the current fallback.working-tree - User wants to cancel the skill mid-cycle — at any
prompt, the user can type a message indicating cancellation (e.g. "stop", "cancel", "abort"); Claude treats this as an early termination request. The current cycle's state is preserved as-is (no auto-rollback of applied fixes; no auto-commit). Claude prints a short summary: "Skill cancelled at cycle N step M. Applied fixes in this session: <list>. Remaining state: <working-tree dirty | N cycle commits on <branch>>. Manual cleanup may be needed depending on your preferences (git stash, git reset, amend, etc.)." The skill does NOT attempt any destructive cleanup on behalf of the user. Between-prompts cancellation (user Ctrl-C or tab close without an active prompt) falls under the "Conversation context is lost mid-run" bullet.AskUserQuestion
References
— target-kind detection and the canonical code/plan focus text.references/focus-text.md
— full details of the six validity items.references/validity-checklist.md
— 日本語で render する場合の summary table / stop signal footer / 終了メッセージ例。references/summary-samples.ja.md
— scope triage skill invoked at step 10a (DoD collection, 4-category triage, rejected ledger, stop signals).skills/review-scope-guard/SKILL.md