Agent-design-language repo-code-review

Review an entire repository or large repo slice for bugs, regressions, security risks, correctness issues, maintainability risks, misleading diagnostics, and missing tests across all severity levels. Use when the user asks for a code review of a whole repo, wants repo-wide risk assessment before merge or release, or needs findings-first review output across multiple files.

install

source · Clone the upstream repo

git clone https://github.com/danielbaustin/agent-design-language

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/danielbaustin/agent-design-language "$T" && mkdir -p ~/.claude/skills && cp -r "$T/adl/tools/skills/repo-code-review" ~/.claude/skills/danielbaustin-agent-design-language-repo-code-review && rm -rf "$T"

manifest: adl/tools/skills/repo-code-review/SKILL.md

source content

Repo Code Review

Review repositories with a code-review mindset first, not an implementation mindset. Optimize for finding correctness, regression, security, operability, and maintainability issues across the full severity range before suggesting polish.

Bias the review toward the actual executable codebase first. In mixed repos, treat docs as supporting context unless the user explicitly asks for doc review or the docs define critical invariants that the code claims to implement.

Always include build, dependency, and package configuration in the review surface. Do not treat files such as

Cargo.toml

package.json

pyproject.toml

go.mod

, workspace manifests, toolchain files, Dockerfiles, CI workflow files, or lockfiles as incidental just because they are not executable source.

Quick Start

Confirm the review target:
- current repo
- specific path
- branch or diff if one is provided
Build a deterministic inventory before reading deeply:
- run
```
scripts/repo_inventory.py
```
  against the repo root
- identify likely product code, manifests, dependency/build config, tests, generated code, and vendor code
- identify the largest code files and largest files overall
Prioritize high-signal surfaces:
- top-level manifests, workspace manifests, dependency declarations, and build/toolchain config before deep code reads
- executable source files before docs
- entrypoints and routing
- auth, permissions, secrets, signing, or trust boundaries
- persistence and migrations
- concurrency, retries, cancellation, and state transitions
- serialization, parsing, and external I/O
- tests covering critical paths
After the high-risk pass, perform a second sweep for lower-severity but real issues:
- misleading error classification
- unsafe or leaky artifact paths
- resume/recovery inconsistencies
- weak diagnostics or observability gaps
- portability and privacy footguns
- stale or contradictory tests
- overlarge files, modules, or test files that create maintenance risk
Run targeted tests when they are repo-local, reasonably bounded, and directly validate reviewed behavior.
Emit findings first, ordered by severity, with file references and concrete reasoning.

Review Standard

Default to a production code review standard:

look for behavioral bugs
look for security or trust-boundary mistakes
look for failure-mode gaps
look for missing validation or missing tests around risky code
look for regressions caused by partial refactors or drift between code and docs
look for lower-severity but still real issues such as misleading diagnostics, privacy leaks, portability hazards, and path-handling drift

Do not spend most of the review on style nits unless the user explicitly asks for style feedback.

Workflow

1. Scope the Review

Determine whether the user wants:

the whole repository
a subtree
a branch or PR-style review
a release-readiness scan

If no narrower scope is provided, review the repository root as a whole.

2. Build the Repo Inventory

Run

scripts/repo_inventory.py <repo-root>

to get a stable summary of:

dominant file types
likely application roots
likely code roots
test directories
docs-only areas
ignored/generated/vendor-heavy areas
largest files overall
largest code files by line count

Use the inventory to avoid spending too much time in:

```
node_modules
```
lockfiles
vendored dependencies
generated artifacts
build outputs

Use the inventory to focus first on:

top-level manifests and workspace/package config
```
largest_code_files
```
likely code roots
entrypoint-bearing modules

3. Choose Reading Order

Prefer this order when the repo is large:

top-level manifests and executable entrypoints
dependency, build, toolchain, packaging, and CI configuration
core runtime modules
largest code files and most central modules
stateful/storage/integration code
security-sensitive code
tests for the above
lower-risk support code
artifact, logging, export, and recovery surfaces that often hide P3-P5 issues
docs only as support for intended invariants

If the repo has architecture docs or a security doc, skim those early to understand intended invariants.

4. Evaluate by Risk

For each important area, ask:

What invariant is this code trying to preserve?
What input or state transition can violate that invariant?
What happens on malformed input, retries, cancellation, partial failure, or restart?
Is there a missing test for the risky path?
Does the implementation still match the documented behavior?

For manifest and config files, also ask:

Do dependency versions, feature flags, and workspace wiring match the code's assumptions?
Are build, release, CI, and toolchain settings likely to change runtime behavior or security posture?
Are there risky defaults, missing hardening flags, surprising optional features, or stale dependency declarations?
Does the lockfile or resolved dependency story contradict the intended dependency policy?

Then ask a second-pass question:

Even if this is not a release-blocking bug, does it create misleading behavior, poor diagnostics, portability risk, privacy leakage, confusing policy behavior, or future maintenance risk?

Then ask a maintainability question:

Is this file or module large enough that size alone is increasing review risk, coupling, or change hazard?

5. Produce Findings

Each finding should include:

priority level
affected file and line
what is wrong
why it matters in behavior terms
what scenario triggers it

Prefer concrete findings such as:

incorrect conditionals
stale assumptions after refactor
state-machine holes
missing error handling
inconsistent validation
risky manifest or feature-flag configuration
dependency or build configuration drift
unsafe path or shell handling
race-prone logic
incorrect serialization or parsing
missing rollback or cleanup

Treat these as valid lower-severity findings when they are concrete:

incorrect or misleading error taxonomy
unnecessary retries on deterministic failure
host-path leakage in durable artifacts
unsafe or insufficient identifier/path normalization
recovery/resume behavior that trusts mutable state too early
documentation or test cases that encode contradictory behavior
overlarge source files or tests with enough breadth that they impair safe review and change isolation

Use the full priority range when warranted:

```
P0
```
: critical security/data-loss/correctness breakage
```
P1
```
: high-impact bug or policy failure
```
P2
```
: meaningful correctness, safety, or operational issue
```
P3
```
: moderate reliability, observability, or maintainability issue
```
P4
```
: low-severity but real issue with concrete downside
```
P5
```
: very low-severity but still actionable issue; use sparingly and only when non-speculative

Do not invent P4/P5 filler. Report them only when they are real, concrete, and still worth fixing.

File-size findings are valid when the size creates a real engineering downside, for example:

too large to review safely in one pass
mixes unrelated responsibilities
encourages hidden coupling
makes targeted testing difficult
repeatedly acts as a dumping ground for unrelated logic

6. Validate with Tests When Feasible

Prefer running a small, targeted validation step when all of the following are true:

the repo has a clear local test command
the command is scoped to reviewed behavior
runtime cost is reasonable
no extra approvals or external systems are required

Examples:

one Rust test module related to the reviewed subsystem
one package test target
one focused shell test script

If tests are too broad, too slow, flaky, unavailable, or require external services, do not fake coverage. State that they were not run and why.

7. State Residual Risk

If no significant findings are found, say so explicitly and mention residual review limits such as:

not all paths were executed
no runtime validation was performed
generated code was skipped
integration behavior remains unverified

Output Format

Default output order:

findings
open questions or assumptions
short summary of review coverage
validation performed

Use the output contract in

references/output-contract.md

when ADL expects a structured artifact.

Artifact Path

When the review should be written to disk, store it at:

```
.adl/reviews/<timestamp>-repo-review.md
```

Use a filesystem-safe timestamp such as

YYYYMMDD-HHMMSS

If ADL provides a more specific output target, follow that target instead.

Boundaries

This skill may:

inspect repository files
run bounded read-only discovery commands
run the bundled inventory script
compare code, tests, and documentation
compare manifests, lockfiles, CI, and build configuration against code assumptions
run targeted repo-local tests when they are relevant and bounded
produce findings-first review output

This skill must not:

silently rewrite code during review
broaden into implementation unless the user asks
claim execution coverage that was not actually performed
treat generated or vendored code as first-class review targets unless the user asks
spend most of the review budget on docs when executable code is present

ADL Compatibility

This skill is Codex-compatible through

name

and

description

in frontmatter.

For stricter ADL execution, also use:

```
adl-skill.yaml
```
for machine-readable admission and output policy
```
references/review-playbook.md
```
for the expanded review procedure
```
references/output-contract.md
```
for structured result shape

Resources

Inventory helper:
```
scripts/repo_inventory.py
```
Detailed procedure:
```
references/review-playbook.md
```
Structured output contract:
```
references/output-contract.md
```