Claude-skill-registry agent-ops-critical-review
Deep, excruciating code review. Use anytime to analyze code for correctness, edge cases, security, performance, and design issues. Not tied to baseline—this is pure code analysis.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/agent-ops-critical-review" ~/.claude/skills/majiayu000-claude-skill-registry-agent-ops-critical-review && rm -rf "$T"
skills/data/agent-ops-critical-review/SKILL.mdCritical Code Review
Purpose
Perform a deep, thorough, excruciating review of code. This skill is about analyzing code quality, correctness, and design—NOT about running tests or comparing to baseline. Use it anytime you need to scrutinize code.
API Detection
Before starting review, check if project contains APIs:
api_indicators: - OpenAPI/Swagger spec (openapi.yaml, swagger.json, openapi.json) - API framework patterns (FastAPI, Flask, Express, ASP.NET controllers) - Route decorators (@app.route, @router.get, [HttpGet], etc.) - API test files (test_api_*, *_api_test.*, api.spec.*)
If API detected:
- Note in review scope: "API endpoints detected"
- After general review, invoke
for API-specific auditagent-ops-api-review - Merge API review findings into final report
When to Use
- During planning: Review existing code before modifying
- During implementation: Review your own changes
- Before completion: Final quality check
- On demand: User requests a code review
- After recovery: Verify recovery didn't introduce issues
Review Dimensions
1) Correctness
Logic Analysis:
- Does the code do what it claims to do?
- Are all code paths reachable and correct?
- Are conditionals exhaustive (all cases handled)?
- Are loop invariants maintained?
- Are recursive functions guaranteed to terminate?
Boundary Conditions:
- Empty inputs (null, undefined, empty string, empty array)
- Single element cases
- Maximum values / overflow potential
- Off-by-one errors
- Unicode / special characters
State Management:
- Is state modified predictably?
- Are there race conditions?
- Is cleanup always performed (finally blocks, defer, etc.)?
2) Error Handling
Questions to Ask:
- What can fail here?
- Is every failure case handled?
- Are errors propagated appropriately?
- Are error messages informative?
- Is there silent failure hiding bugs?
Checklist:
- All external calls wrapped in try/catch or equivalent
- Errors logged with context
- User-facing errors are actionable
- No swallowed exceptions
- Cleanup happens even on error
3) Security
Input Validation:
- Are all external inputs validated?
- Is there SQL injection potential?
- Is there XSS potential?
- Is there command injection potential?
- Are paths sanitized (no traversal)?
Authentication/Authorization:
- Is auth checked before sensitive operations?
- Are tokens/secrets handled securely?
- Is sensitive data logged? (it shouldn't be)
Data Handling:
- Is PII protected?
- Are passwords hashed, not encrypted?
- Is data sanitized before display?
4) Performance
Algorithmic:
- What's the time complexity?
- What's the space complexity?
- Are there N+1 query problems?
- Are there unnecessary iterations?
Resource Management:
- Are connections/handles closed?
- Are large objects released?
- Is there potential for memory leaks?
- Are there blocking calls that should be async?
Caching/Optimization:
- Is expensive computation cached when appropriate?
- Are there premature optimizations obscuring clarity?
5) Design & Maintainability
Single Responsibility:
- Does each function do one thing?
- Are there functions that are too long?
- Is there duplicated logic that should be extracted?
Naming & Clarity:
- Do names accurately describe purpose?
- Are abbreviations clear or confusing?
- Would a new team member understand this?
Dependencies:
- Are dependencies appropriate?
- Is there tight coupling that could be loosened?
- Are there circular dependencies?
Testability:
- Can this code be unit tested?
- Are dependencies injectable?
- Is there global state that complicates testing?
6) Consistency
- Does it follow project conventions?
- Does it match surrounding code style?
- Are patterns used consistently?
7) SOLID Principles (for OO/class-based code)
When to apply: Review SOLID for class-based code, services, and complex modules. Skip for simple scripts, utilities, or purely functional code.
Single Responsibility (SRP):
- Does each class have one clear purpose?
- Would changes to one feature require modifying this class for unrelated reasons?
- Could this class be split into smaller, focused classes?
Open/Closed (OCP):
- Can new features be added by extension rather than modification?
- Are there long if/elif/switch chains that should use polymorphism?
- Is the Strategy, Factory, or Template pattern appropriate?
Liskov Substitution (LSP):
- Do subclasses honor the contracts of their parent classes?
- Are there overridden methods that change expected behavior?
- Does substituting a subclass for its parent break anything?
Interface Segregation (ISP):
- Are interfaces/ABCs focused and cohesive?
- Do implementations need all methods they're required to have?
- Should large interfaces be split into smaller, role-specific ones?
Dependency Inversion (DIP):
- Do high-level modules depend on abstractions, not concretions?
- Are dependencies injected rather than created internally?
- Can this code be tested without real implementations of dependencies?
SOLID Checklist (for class-based code):
- Classes have single, clear responsibilities
- New behavior can be added via extension
- Subclasses are substitutable for parents
- Interfaces are focused and minimal
- Dependencies are on abstractions, not concretions
Common Python SOLID Violations:
# ❌ SRP Violation — class does too much class UserService: def create_user(self): ... def send_email(self): ... def generate_report(self): ... # ✅ SRP Fixed — separate concerns class UserRepository: ... class EmailService: ... class ReportGenerator: ... # ❌ OCP Violation — must modify to add new types def process(item): if item.type == "A": ... elif item.type == "B": ... # Must add elif for each new type # ✅ OCP Fixed — extend via new classes class Processor(ABC): @abstractmethod def process(self): ... # ❌ DIP Violation — depends on concrete class class OrderService: def __init__(self): self.repo = SqliteRepository() # Concrete! # ✅ DIP Fixed — depends on abstraction class OrderService: def __init__(self, repo: Repository): # Abstract! self.repo = repo
Review Procedure
For Changed Code
- Read the diff thoroughly — understand every line changed
- Trace data flow — follow inputs through to outputs
- Question assumptions — what if assumptions are wrong?
- Check edges — what happens at boundaries?
- Verify error paths — what happens when things fail?
- Consider security — can this be exploited?
- Assess performance — will this scale?
For New Code
All of the above, plus:
- Is this the right abstraction?
- Is there existing code that does this?
- Does this integrate cleanly?
For Existing Code (Understanding)
Focus on:
- What does this code actually do?
- What are the invariants?
- What are the failure modes?
- Where are the risks?
Review Output Format
## Critical Review: [component/file] ### Summary - Scope: [what was reviewed] - Severity: [critical/high/medium/low findings] - Verdict: [APPROVE / CHANGES NEEDED / BLOCK] ### Critical Issues (must fix) 1. **[Issue title]** - Location: file.ts#L10-20 - Problem: [description] - Risk: [what could go wrong] - Suggestion: [how to fix] ### High Priority (should fix) 1. ... ### Medium Priority (consider fixing) 1. ... ### Low Priority (optional improvements) 1. ... ### Positive Observations - [What's done well — reinforce good patterns]
Review Mindset
Be Thorough
- Don't skim — read every line
- Don't assume — verify
- Don't trust happy path — test failure cases mentally
Be Skeptical
- What if this input is malicious?
- What if this external service fails?
- What if this runs concurrently?
Be Constructive
- Explain WHY something is a problem
- Suggest alternatives, don't just criticize
- Acknowledge good work
Be Honest
- If you don't understand something, say so
- If you're uncertain about an issue, rate your confidence
- Don't rubber-stamp — that defeats the purpose
Integration with Workflow
This skill is independent of baseline comparison. It's pure code analysis.
- Use with baseline: After critical review, separately run validation/baseline comparison
- Use alone: Review code quality without running any tests
- Use during planning: Analyze existing code before changes
- Use for learning: Understand unfamiliar codebase
Output
Update
.agent/focus.md:
## Just did - Critical review of [component] - Findings: X critical, Y high, Z medium - Verdict: [APPROVE / CHANGES NEEDED / BLOCK]
Issue Discovery After Review
After review, invoke
discovery procedure:agent-ops-tasks
-
Collect all findings from review output:
- Critical issues →
orBUG
(critical/high)SEC - High priority →
,BUG
, orSEC
(high)PERF - Medium priority → various types (medium)
- Low priority →
,CHORE
,REFAC
(low)DOCS
- Critical issues →
-
Present to user:
📋 Review found {N} issues: Critical: - [SEC] SQL injection vulnerability in UserController#search - [BUG] Race condition in cache invalidation High: - [PERF] N+1 query in OrderService#list - [BUG] Unhandled null in payment callback Medium: - [CHORE] Error messages not user-friendly - [TEST] Missing edge case coverage Low: - [REFAC] Extract duplicate validation logic - [DOCS] Add JSDoc to public methods Create issues for these? [A]ll / [S]elect / [N]one -
After creating issues:
Created {N} issues. What's next? 1. Start fixing highest priority (SEC-0001@abc123 - SQL injection) 2. Create more issues 3. Continue with original task 4. Review issue list -
If user declines:
- Log findings summary in focus.md
- Note "Review findings not converted to issues (user declined)"