Claude-toolbox review-spec
git clone https://github.com/serpro69/claude-toolbox
T=$(mktemp -d) && git clone --depth=1 https://github.com/serpro69/claude-toolbox "$T" && mkdir -p ~/.claude/skills && cp -r "$T/klaude-plugin/skills/review-spec" ~/.claude/skills/serpro69-claude-toolbox-review-spec && rm -rf "$T"
klaude-plugin/skills/review-spec/SKILL.mdImplementation Review
Conventions
Read capy knowledge base conventions at shared-capy-knowledge-protocol.md.
Overview
Systematically compare implemented code against a feature's
design.md, implementation.md, and tasks.md in /docs/wip/[feature]/. Works both mid-implementation (reviewing completed tasks only) and post-implementation (full feature review).
Findings go in both directions — code that deviates from spec AND spec that is wrong or outdated given the code.
Required Outputs
Before declaring the review complete, verify all outputs are delivered:
- Review report presented to user
- User-confirmed intentional
/SPEC_DEV
findings indexed asEXTRA_IMPL
(skip if none confirmed)kk:arch-decisions - Next steps confirmation from user
Indexing is owned by this skill — callers (e.g.,
implement) do NOT duplicate it.
Review Modes
Standard Mode (/kk:review-spec
)
/kk:review-specReviews spec conformance in the main conversation context. Single-pass review using the workflow below.
Isolated Mode (/kk:review-spec:isolated
)
/kk:review-spec:isolatedDelegates detection to an independent
spec-reviewer sub-agent that did not write the code, then annotates its findings with type-specific author context. Low-relevance types (MISSING_IMPL, DOC_INCON, OUTDATED_DOC, AMBIGUOUS) get brief annotations; high-relevance types (SPEC_DEV, EXTRA_IMPL) get detailed annotations with spec update suggestions.
- Cost: Higher (sub-agent + annotation)
- Isolation: True — reviewer has zero authorship bias or session context
- Degradation: If sub-agent fails, suggests standard mode fallback
- Best for: When extra rigor is worth the cost (post-implementation, pre-merge)
See review-isolated.md for the isolated workflow.
Finding Types
Each finding is classified by type (what kind of mismatch) and severity (how urgent).
| Type | Code | Description | Example |
|---|---|---|---|
| Missing Implementation | | Spec describes something that was not implemented | Design says "rate limiting on /api/auth" but no rate limiter exists |
| Extra Implementation | | Code implements something not in the spec | A caching layer was added that design docs don't mention |
| Spec Deviation | | Code implements the feature but differently than specified | Design says "bcrypt cost 12" but code uses cost 10 |
| Doc Inconsistency | | Documentation contradicts itself or is internally inconsistent | design.md says JWT tokens, implementation.md says session cookies |
| Outdated Doc | | Code is correct but docs haven't been updated to reflect reality | Endpoint was renamed during implementation but docs still reference old name |
| Ambiguous Spec | | Spec is unclear enough that multiple interpretations are valid | "Support pagination" without specifying cursor vs offset |
Severity Levels
Same P0–P3 scale as
review-code, adapted for spec conformance:
| Level | Name | Description | Action |
|---|---|---|---|
| P0 | Critical | Missing core functionality, security spec violated, data model mismatch | Must fix before merge |
| P1 | High | Significant behavioral deviation from spec, missing error handling that spec requires | Should fix before merge |
| P2 | Medium | Minor deviation, doc inconsistency, partial implementation of a spec requirement | Fix or create follow-up |
| P3 | Low | Naming mismatch, doc typo, cosmetic deviation from spec | Optional |
Confidence Levels
Each finding gets a confidence score (1–10) with mandatory reasoning explaining what was checked, what evidence supports the finding, and what uncertainty remains.
| Score | Meaning |
|---|---|
| 9–10 | Certain — direct, unambiguous contradiction between spec and code |
| 7–8 | Strong — clear evidence but minor room for interpretation |
| 5–6 | Moderate — likely issue but spec is somewhat vague or code has plausible alternative reading |
| 3–4 | Uncertain — possible issue, needs human judgment |
| 1–2 | Speculative — gut feeling, very ambiguous spec or indirect evidence |
Workflow
See review-process.md for the detailed step-by-step process.
Phases:
- Load feature documents
- Capy search: Search
for design rationale that may explain intentional spec deviations. Searchkk:arch-decisions
for known patterns from prior reviews.kk:review-findings - Determine review scope (mid-implementation vs post-implementation)
- Per-task verification against spec
- Cross-cutting concern check
- Self-check and confidence assessment
- Present findings
- Index confirmed deviations — index user-confirmed intentional
/SPEC_DEV
asEXTRA_IMPLkk:arch-decisions
Invocation
Use the
/review-spec [feature-name] command, or invoke naturally when a user asks to verify implementation against docs.
For isolated mode with an independent sub-agent:
/kk:review-spec:isolated [feature-name]