Claude-skill-registry agent-ops-critical-review

Deep, excruciating code review. Use anytime to analyze code for correctness, edge cases, security, performance, and design issues. Not tied to baseline—this is pure code analysis.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/agent-ops-critical-review" ~/.claude/skills/majiayu000-claude-skill-registry-agent-ops-critical-review && rm -rf "$T"

manifest: skills/data/agent-ops-critical-review/SKILL.md

Critical Code Review

Purpose

Perform a deep, thorough, excruciating review of code. This skill is about analyzing code quality, correctness, and design—NOT about running tests or comparing to baseline. Use it anytime you need to scrutinize code.

API Detection

Before starting review, check if project contains APIs:

api_indicators:
  - OpenAPI/Swagger spec (openapi.yaml, swagger.json, openapi.json)
  - API framework patterns (FastAPI, Flask, Express, ASP.NET controllers)
  - Route decorators (@app.route, @router.get, [HttpGet], etc.)
  - API test files (test_api_*, *_api_test.*, api.spec.*)

If API detected:

Note in review scope: "API endpoints detected"
After general review, invoke
```
agent-ops-api-review
```
for API-specific audit
Merge API review findings into final report

When to Use

During planning: Review existing code before modifying
During implementation: Review your own changes
Before completion: Final quality check
On demand: User requests a code review
After recovery: Verify recovery didn't introduce issues

Review Dimensions

1) Correctness

Logic Analysis:

Does the code do what it claims to do?
Are all code paths reachable and correct?
Are conditionals exhaustive (all cases handled)?
Are loop invariants maintained?
Are recursive functions guaranteed to terminate?

Boundary Conditions:

Empty inputs (null, undefined, empty string, empty array)
Single element cases
Maximum values / overflow potential
Off-by-one errors
Unicode / special characters

State Management:

Is state modified predictably?
Are there race conditions?
Is cleanup always performed (finally blocks, defer, etc.)?

2) Error Handling

Questions to Ask:

What can fail here?
Is every failure case handled?
Are errors propagated appropriately?
Are error messages informative?
Is there silent failure hiding bugs?

Checklist:

All external calls wrapped in try/catch or equivalent
Errors logged with context
User-facing errors are actionable
No swallowed exceptions
Cleanup happens even on error

3) Security

Input Validation:

Are all external inputs validated?
Is there SQL injection potential?
Is there XSS potential?
Is there command injection potential?
Are paths sanitized (no traversal)?

Authentication/Authorization:

Is auth checked before sensitive operations?
Are tokens/secrets handled securely?
Is sensitive data logged? (it shouldn't be)

Data Handling:

Is PII protected?
Are passwords hashed, not encrypted?
Is data sanitized before display?

4) Performance

Algorithmic:

What's the time complexity?
What's the space complexity?
Are there N+1 query problems?
Are there unnecessary iterations?

Resource Management:

Are connections/handles closed?
Are large objects released?
Is there potential for memory leaks?
Are there blocking calls that should be async?

Caching/Optimization:

Is expensive computation cached when appropriate?
Are there premature optimizations obscuring clarity?

5) Design & Maintainability

Single Responsibility:

Does each function do one thing?
Are there functions that are too long?
Is there duplicated logic that should be extracted?

Naming & Clarity:

Do names accurately describe purpose?
Are abbreviations clear or confusing?
Would a new team member understand this?

Dependencies:

Are dependencies appropriate?
Is there tight coupling that could be loosened?
Are there circular dependencies?

Testability:

Can this code be unit tested?
Are dependencies injectable?
Is there global state that complicates testing?

6) Consistency

Does it follow project conventions?
Does it match surrounding code style?
Are patterns used consistently?

7) SOLID Principles (for OO/class-based code)

When to apply: Review SOLID for class-based code, services, and complex modules. Skip for simple scripts, utilities, or purely functional code.

Single Responsibility (SRP):

Does each class have one clear purpose?
Would changes to one feature require modifying this class for unrelated reasons?
Could this class be split into smaller, focused classes?

Open/Closed (OCP):

Can new features be added by extension rather than modification?
Are there long if/elif/switch chains that should use polymorphism?
Is the Strategy, Factory, or Template pattern appropriate?

Liskov Substitution (LSP):

Do subclasses honor the contracts of their parent classes?
Are there overridden methods that change expected behavior?
Does substituting a subclass for its parent break anything?

Interface Segregation (ISP):

Are interfaces/ABCs focused and cohesive?
Do implementations need all methods they're required to have?
Should large interfaces be split into smaller, role-specific ones?

Dependency Inversion (DIP):

Do high-level modules depend on abstractions, not concretions?
Are dependencies injected rather than created internally?
Can this code be tested without real implementations of dependencies?

SOLID Checklist (for class-based code):

Classes have single, clear responsibilities
New behavior can be added via extension
Subclasses are substitutable for parents
Interfaces are focused and minimal
Dependencies are on abstractions, not concretions

Common Python SOLID Violations:

# ❌ SRP Violation — class does too much
class UserService:
    def create_user(self): ...
    def send_email(self): ...
    def generate_report(self): ...

# ✅ SRP Fixed — separate concerns
class UserRepository: ...
class EmailService: ...
class ReportGenerator: ...

# ❌ OCP Violation — must modify to add new types
def process(item):
    if item.type == "A": ...
    elif item.type == "B": ...
    # Must add elif for each new type

# ✅ OCP Fixed — extend via new classes
class Processor(ABC):
    @abstractmethod
    def process(self): ...

# ❌ DIP Violation — depends on concrete class
class OrderService:
    def __init__(self):
        self.repo = SqliteRepository()  # Concrete!

# ✅ DIP Fixed — depends on abstraction
class OrderService:
    def __init__(self, repo: Repository):  # Abstract!
        self.repo = repo

Review Procedure

For Changed Code

Read the diff thoroughly — understand every line changed
Trace data flow — follow inputs through to outputs
Question assumptions — what if assumptions are wrong?
Check edges — what happens at boundaries?
Verify error paths — what happens when things fail?
Consider security — can this be exploited?
Assess performance — will this scale?

For New Code

All of the above, plus:

Is this the right abstraction?
Is there existing code that does this?
Does this integrate cleanly?

For Existing Code (Understanding)

Focus on:

What does this code actually do?
What are the invariants?
What are the failure modes?
Where are the risks?

Review Output Format

## Critical Review: [component/file]

### Summary
- Scope: [what was reviewed]
- Severity: [critical/high/medium/low findings]
- Verdict: [APPROVE / CHANGES NEEDED / BLOCK]

### Critical Issues (must fix)
1. **[Issue title]**
   - Location: file.ts#L10-20
   - Problem: [description]
   - Risk: [what could go wrong]
   - Suggestion: [how to fix]

### High Priority (should fix)
1. ...

### Medium Priority (consider fixing)
1. ...

### Low Priority (optional improvements)
1. ...

### Positive Observations
- [What's done well — reinforce good patterns]

Review Mindset

Be Thorough

Don't skim — read every line
Don't assume — verify
Don't trust happy path — test failure cases mentally

Be Skeptical

What if this input is malicious?
What if this external service fails?
What if this runs concurrently?

Be Constructive

Explain WHY something is a problem
Suggest alternatives, don't just criticize
Acknowledge good work

Be Honest

If you don't understand something, say so
If you're uncertain about an issue, rate your confidence
Don't rubber-stamp — that defeats the purpose

Integration with Workflow

This skill is independent of baseline comparison. It's pure code analysis.

Use with baseline: After critical review, separately run validation/baseline comparison
Use alone: Review code quality without running any tests
Use during planning: Analyze existing code before changes
Use for learning: Understand unfamiliar codebase

Output

Update

.agent/focus.md

## Just did
- Critical review of [component]
  - Findings: X critical, Y high, Z medium
  - Verdict: [APPROVE / CHANGES NEEDED / BLOCK]

Issue Discovery After Review

After review, invoke

agent-ops-tasks

discovery procedure:

Collect all findings from review output:
- Critical issues →
```
BUG
```
  or
```
SEC
```
  (critical/high)
- High priority →
```
BUG
```
  ,
```
SEC
```
  , or
```
PERF
```
  (high)
- Medium priority → various types (medium)
- Low priority →
```
CHORE
```
  ,
```
REFAC
```
  ,
```
DOCS
```
  (low)

Present to user:

📋 Review found {N} issues:

Critical:
- [SEC] SQL injection vulnerability in UserController#search
- [BUG] Race condition in cache invalidation

High:
- [PERF] N+1 query in OrderService#list
- [BUG] Unhandled null in payment callback

Medium:
- [CHORE] Error messages not user-friendly
- [TEST] Missing edge case coverage

Low:
- [REFAC] Extract duplicate validation logic
- [DOCS] Add JSDoc to public methods

Create issues for these? [A]ll / [S]elect / [N]one

After creating issues:

Created {N} issues. What's next?

1. Start fixing highest priority (SEC-0001@abc123 - SQL injection)
2. Create more issues
3. Continue with original task
4. Review issue list

If user declines:
- Log findings summary in focus.md
- Note "Review findings not converted to issues (user declined)"