Claude-toolbox review-spec

install
source · Clone the upstream repo
git clone https://github.com/serpro69/claude-toolbox
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/serpro69/claude-toolbox "$T" && mkdir -p ~/.claude/skills && cp -r "$T/klaude-plugin/skills/review-spec" ~/.claude/skills/serpro69-claude-toolbox-review-spec && rm -rf "$T"
manifest: klaude-plugin/skills/review-spec/SKILL.md
source content

Implementation Review

Conventions

Read capy knowledge base conventions at shared-capy-knowledge-protocol.md.

Overview

Systematically compare implemented code against a feature's

design.md
,
implementation.md
, and
tasks.md
in
/docs/wip/[feature]/
. Works both mid-implementation (reviewing completed tasks only) and post-implementation (full feature review).

Findings go in both directions — code that deviates from spec AND spec that is wrong or outdated given the code.

Required Outputs

Before declaring the review complete, verify all outputs are delivered:

  • Review report presented to user
  • User-confirmed intentional
    SPEC_DEV
    /
    EXTRA_IMPL
    findings indexed as
    kk:arch-decisions
    (skip if none confirmed)
  • Next steps confirmation from user

Indexing is owned by this skill — callers (e.g.,

implement
) do NOT duplicate it.

Review Modes

Standard Mode (
/kk:review-spec
)

Reviews spec conformance in the main conversation context. Single-pass review using the workflow below.

Isolated Mode (
/kk:review-spec:isolated
)

Delegates detection to an independent

spec-reviewer
sub-agent that did not write the code, then annotates its findings with type-specific author context. Low-relevance types (MISSING_IMPL, DOC_INCON, OUTDATED_DOC, AMBIGUOUS) get brief annotations; high-relevance types (SPEC_DEV, EXTRA_IMPL) get detailed annotations with spec update suggestions.

  • Cost: Higher (sub-agent + annotation)
  • Isolation: True — reviewer has zero authorship bias or session context
  • Degradation: If sub-agent fails, suggests standard mode fallback
  • Best for: When extra rigor is worth the cost (post-implementation, pre-merge)

See review-isolated.md for the isolated workflow.

Finding Types

Each finding is classified by type (what kind of mismatch) and severity (how urgent).

TypeCodeDescriptionExample
Missing Implementation
MISSING_IMPL
Spec describes something that was not implementedDesign says "rate limiting on /api/auth" but no rate limiter exists
Extra Implementation
EXTRA_IMPL
Code implements something not in the specA caching layer was added that design docs don't mention
Spec Deviation
SPEC_DEV
Code implements the feature but differently than specifiedDesign says "bcrypt cost 12" but code uses cost 10
Doc Inconsistency
DOC_INCON
Documentation contradicts itself or is internally inconsistentdesign.md says JWT tokens, implementation.md says session cookies
Outdated Doc
OUTDATED_DOC
Code is correct but docs haven't been updated to reflect realityEndpoint was renamed during implementation but docs still reference old name
Ambiguous Spec
AMBIGUOUS
Spec is unclear enough that multiple interpretations are valid"Support pagination" without specifying cursor vs offset

Severity Levels

Same P0–P3 scale as

review-code
, adapted for spec conformance:

LevelNameDescriptionAction
P0CriticalMissing core functionality, security spec violated, data model mismatchMust fix before merge
P1HighSignificant behavioral deviation from spec, missing error handling that spec requiresShould fix before merge
P2MediumMinor deviation, doc inconsistency, partial implementation of a spec requirementFix or create follow-up
P3LowNaming mismatch, doc typo, cosmetic deviation from specOptional

Confidence Levels

Each finding gets a confidence score (1–10) with mandatory reasoning explaining what was checked, what evidence supports the finding, and what uncertainty remains.

ScoreMeaning
9–10Certain — direct, unambiguous contradiction between spec and code
7–8Strong — clear evidence but minor room for interpretation
5–6Moderate — likely issue but spec is somewhat vague or code has plausible alternative reading
3–4Uncertain — possible issue, needs human judgment
1–2Speculative — gut feeling, very ambiguous spec or indirect evidence

Workflow

See review-process.md for the detailed step-by-step process.

Phases:

  1. Load feature documents
  2. Capy search: Search
    kk:arch-decisions
    for design rationale that may explain intentional spec deviations. Search
    kk:review-findings
    for known patterns from prior reviews.
  3. Determine review scope (mid-implementation vs post-implementation)
  4. Per-task verification against spec
  5. Cross-cutting concern check
  6. Self-check and confidence assessment
  7. Present findings
  8. Index confirmed deviations — index user-confirmed intentional
    SPEC_DEV
    /
    EXTRA_IMPL
    as
    kk:arch-decisions

Invocation

Use the

/review-spec [feature-name]
command, or invoke naturally when a user asks to verify implementation against docs.

For isolated mode with an independent sub-agent:

/kk:review-spec:isolated [feature-name]