Agentops review
Review incoming PRs, agent-generated changes, or diffs. Structured review with security, correctness, performance, and maintainability checks. Triggers: "review", "review PR", "review changes", "code review", "review this PR", "review agent output", "check this diff".
git clone https://github.com/boshu2/agentops
T=$(mktemp -d) && git clone --depth=1 https://github.com/boshu2/agentops "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills-codex/review" ~/.claude/skills/boshu2-agentops-review && rm -rf "$T"
skills-codex/review/SKILL.mdReview Skill
Quick Ref:
reviews a PR,$review <PR>reviews local changes,$review --diffreviews agent output with extra scrutiny.$review --agent <path>
YOU MUST EXECUTE THIS WORKFLOW. Do not just describe it.
This skill is for reviewing OTHER people's or agents' changes. For validating your own code quality, use
$vibe instead.
Modes
$review 42 # PR mode — review PR #42 $review https://github.com/o/r/pull/42 # PR mode — review by URL $review --diff # Diff mode — review unstaged/staged changes $review --diff --staged # Diff mode — staged only $review --agent .agents/crank/ # Agent mode — review agent-generated output $review --agent ./output.patch # Agent mode — review a patch file $review --deep 42 # Deep mode — spawns council for second opinion
Execution Steps
Step 0: Detect Review Target and Load Standards
Determine the review mode from arguments:
- PR mode (default): argument is a number or GitHub PR URL.
- Diff mode:
flag present.--diff - Agent mode:
flag present.--agent <path>
Load language-specific conventions from
$standards based on file extensions in the diff. If ao is available, pull prior review context:
ao lookup --query "code review patterns $(basename "$PWD")" --limit 3 2>/dev/null || true
Apply retrieved knowledge (mandatory when results returned):
If learnings are returned, do NOT just load them as passive context. For each returned item:
- Check: does this learning apply to the code under review? (answer yes/no)
- If yes: include it as a
— state the pattern, what to look for, and whether the diff exhibits itknown_risk - Cite the learning by filename in your review output when it influences a finding
After applying, record the citation:
ao metrics cite "<learning-path>" --type applied 2>/dev/null || true
Skip silently if ao is unavailable or returns no results.
Step 0.5: Apply Behavioral Discipline
Load the behavioral discipline standard from
$standards before reviewing the diff. Use it to answer four questions:
- What assumptions does this change make, and were they surfaced or silently chosen?
- Could the same outcome be achieved with a smaller or more local change?
- Does every changed line trace back to the stated goal?
- Does the verification prove the claimed behavior, or only that the code builds?
If any answer is weak, record the problem as a finding. Hidden assumptions, speculative abstractions, drive-by edits, and weak verification are review defects, not style preferences.
Step 1: Fetch the Diff
PR Mode
gh pr view "$PR_REF" --json title,body,author,baseRefName,headRefName,labels,reviewDecision,commits gh pr diff "$PR_REF" gh pr diff "$PR_REF" --name-only
If the PR has more than 500 changed lines, prioritize: security-sensitive files, high-complexity changes, new files, then test files.
Diff Mode
git diff HEAD # unstaged + staged git diff --cached # staged only (with --staged flag) git diff HEAD --name-only # changed file list
Agent Mode
# Directory: find all generated files find "$AGENT_PATH" -type f \( -name '*.go' -o -name '*.py' -o -name '*.ts' -o -name '*.sh' -o -name '*.md' \) # Patch file: inspect stats git apply --stat "$AGENT_PATH"
Step 2: Context Gathering
Understand the intent behind the changes before reviewing the code:
- PR Mode: Read PR title/body, check linked issues (
,fixes #
), read commit messages.closes # - Diff Mode: Check
, branch name, open issues viagit log --oneline -5
.bd list --status open - Agent Mode: Read execution logs in output directory, check
artifacts..agents/rpi/
Output a one-line intent summary before proceeding:
INTENT: <what the change is trying to accomplish>
If intent is unclear, flag it: "PR description does not explain the purpose of this change."
Step 3: Systematic Review Pass (SCORED)
Review every changed file against the SCORED checklist. For each category, actively look for problems. Do not skim -- read each changed line.
S -- Security
- No hardcoded secrets, API keys, tokens, or passwords
- Input validation on all external data (user input, API responses, file reads)
- SQL/command injection: parameterized queries, no string interpolation in commands
- Auth/authz checks present where needed (not just authn)
- Sensitive data not logged or exposed in error messages
- Dependencies: no known-vulnerable versions added
- File operations: path traversal prevention, safe temp file handling
C -- Correctness
- Logic errors: off-by-one, wrong operator, inverted condition
- Edge cases: nil/null handling, empty collections, boundary values
- Error handling: errors checked, not swallowed, wrapped with context
- Race conditions: shared mutable state, concurrent access patterns
- Resource leaks: unclosed files, connections, goroutines, channels
- Type safety: unchecked casts, implicit conversions, overflow potential
- Contract compliance: does the change match the stated intent?
O -- Observability
- Errors include enough context for debugging (what failed, with what input)
- New features have appropriate logging at correct levels
- Metrics or health indicators added for new failure modes
- Error messages are actionable (not just "something went wrong")
R -- Readability
- Names are descriptive and consistent with codebase conventions
- Functions are focused (single responsibility, not doing too much)
- Complex logic has comments explaining WHY (not WHAT)
- No dead code, commented-out code, or leftover debug statements
- Consistent formatting with the rest of the codebase
E -- Efficiency
- No unnecessary allocations in hot paths
- N+1 query patterns (database calls in loops)
- Unbounded growth: maps/slices that grow without limits
- Appropriate use of caching, batching, or pagination
- No blocking operations in async/concurrent contexts
D -- Design
- Abstraction level is appropriate (not over-engineered, not under-abstracted)
- API surface is minimal and consistent with existing patterns
- Changes are cohesive (single concern per PR, not mixing refactoring with features)
- Ambiguity was surfaced instead of silently assumed away
- No speculative flexibility or abstractions beyond the stated need
- Every changed line traces to the requested outcome or required cleanup
- Dependencies flow in the right direction (no circular imports)
- Test coverage: new code has tests, tests verify behavior (not just coverage)
- Breaking changes are documented and intentional
Step 4: Agent-Specific Checks (--agent mode only)
When reviewing agent-generated code, apply additional scrutiny for common agent failure modes:
Hallucinated References
- All imports exist (no invented packages or modules)
- All called functions exist in the codebase or dependencies
- Referenced files and paths actually exist
- API endpoints and URLs are real
Over-Engineering
- No unnecessary abstractions (interfaces with one implementation, factory for one type)
- No premature generalization (generic solution where specific was asked)
- No gold-plating (features not requested)
- Reasonable LOC for the task complexity
Missing Fundamentals
- Error handling is present (agents frequently skip error paths)
- Edge cases are handled (agents often only handle the happy path)
- Cleanup/teardown logic exists (defer, finally, context cancellation)
- Concurrency safety if applicable
Test Quality
- Tests actually assert meaningful behavior (not just
or!= nil
)!= "" - Test names describe the scenario being tested
- Tests cover error paths, not just happy paths
- No
naming pattern (coverage-padding anti-pattern)cov*_test.go - Mocks are realistic (not returning hardcoded success for everything)
Codebase Consistency
- Follows existing naming conventions (check 3+ similar files for patterns)
- Uses existing helpers/utilities instead of reimplementing
- Error handling style matches the codebase
- File organization follows project structure
Step 5: Generate Structured Review Output
Create a review artifact:
REVIEW_DIR=".agents/review" mkdir -p "$REVIEW_DIR" REVIEW_FILE="$REVIEW_DIR/$(date +%Y-%m-%d)-review-$(echo "$PR_REF" | tr '/' '-').md"
Review Document Structure
# Review: <PR title or change description> **Date:** YYYY-MM-DD | **Verdict:** APPROVE | REQUEST_CHANGES | COMMENT **Target:** PR #N / local diff / agent output at <path> ## Intent <one-line summary> ## SCORED Assessment | Category | Rating | Notes | |----------|--------|-------| | Security | pass/warn/fail | ... | | Correctness | pass/warn/fail | ... | | Observability | pass/warn/fail | ... | | Readability | pass/warn/fail | ... | | Efficiency | pass/warn/fail | ... | | Design | pass/warn/fail | ... | ## Findings ### Critical (must fix) - **[file:line]** Issue. Suggested fix: ... ### Warning (should fix) - **[file:line]** Issue. Suggested fix: ... ### Suggestion / Nit - **[file:line]** Description. ## Missing <expected but absent: tests, docs, error handling, migration>
Verdict Rules
- APPROVE: No critical or warning findings. All SCORED categories pass.
- REQUEST_CHANGES: Any critical finding, OR 3+ warnings, OR any SCORED category rated "fail".
- COMMENT: 1-2 warnings with no critical findings. Worth discussing but not blocking.
PR Mode: Post Comments
If reviewing a PR and the verdict is REQUEST_CHANGES or COMMENT, offer to post the review:
# Post review comment on the PR gh pr review "$PR_REF" --comment --body "$(cat "$REVIEW_FILE")" # Or for blocking review gh pr review "$PR_REF" --request-changes --body "$(cat "$REVIEW_FILE")"
Only post if the user confirms. Never auto-post a review without explicit approval.
Deep Mode (--deep)
When
--deep is specified, after the initial SCORED pass, spawn a council for a second opinion:
$council validate "Review these changes for issues I might have missed: <summary of changes>"
Merge council findings into the review document under a "## Council Findings" section.
Integration with Other Skills
| Skill | Relationship |
|---|---|
| Self-review (your own code). is for others' code. |
| Optional second opinion via flag. |
| Auto-loaded for language-specific rules. |
| does a structured pass; does deep investigation of suspected bugs. |
| PR-specific validation (isolation, scope creep). Complementary to . |
See Also
- vibe — Self-review and code quality validation
- council — Multi-model consensus council
- standards — Language-specific coding conventions
- bug-hunt — Deep bug investigation
- pr-validate — PR scope and isolation checks