Awesome-omni-skill review-model-guidance
Guidance for selecting models when performing code review with subtasks. Load this skill to enable intelligent model selection for review analysis — choosing faster models for simple tasks and deeper reasoning models for complex analysis.
git clone https://github.com/diegosouzapw/awesome-omni-skill
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/design/review-model-guidance" ~/.claude/skills/diegosouzapw-awesome-omni-skill-review-model-guidance && rm -rf "$T"
skills/design/review-model-guidance/SKILL.mdReview Model Guidance
When performing code review analysis, you can switch models at any time to match the demands of the task. This applies both to your own direct work and to subtasks you delegate via the
task tool. Use /model to switch your own model, or pass
the model parameter when delegating subtasks.
Before switching models, check which models are actually available by running
${PI_CMD:-pi} --list-models via bash. This shows all models with valid API keys
configured in provider model columns. Specify models as
provider/model. Only switch to models that appear in that list.
Forbidden Models
The following models are never permitted due to extreme cost:
- openai/o3-pro — Prohibitively expensive. Use
instead, which provides strong reasoning at a fraction of the cost.openai/o3
Do not select these models for your own work, subtasks, or any model parameter. If a forbidden model is the only reasoning model available, fall back to a balanced model with extended thinking enabled instead.
When to Switch Models
Not every task benefits from model switching. Use these heuristics:
- Same model is fine when you're doing a single focused task, or when the diff is small enough that model choice won't materially affect quality.
- Switch models when you're delegating subtasks that have clearly different complexity levels, you are reviewing different aspects of the changes, you are reviewing from different perspectives, or when you want a second opinion from a different model family on critical findings.
Model Selection by Task Type
Balanced Models
Use mid-tier models for general code review work:
- File-context analysis: Understanding how changes fit within a file's existing patterns, checking for consistency with surrounding code
- API contract review: Verifying that function signatures, types, and interfaces are used correctly
- Test coverage assessment: Evaluating whether test changes match code changes
- Most general code review: The default choice when you don't have a strong reason to go deeper
- Small to medium diffs: The review itself is straightforward
Good balanced choices:
anthropic/claude-sonnet-4-5, google/gemini-2.5-pro, openai/gpt-5
Deep Reasoning Models
For complex analysis, enable extended thinking or switch to a reasoning model. Here are the higher-end models and their strengths for code review:
Anthropic
- anthropic/claude-opus-4-6: Anthropic's most capable model. Strongest at nuanced architectural reasoning, understanding implicit design patterns, and catching subtle issues that require deep understanding of intent. Excellent at security review and explaining why something is problematic, not just that it is.
- anthropic/claude-opus-4-5: Previous-generation flagship. Still very strong for complex review tasks. Good alternative when opus-4-6 is unavailable.
- anthropic/claude-sonnet-4-5 with extended thinking: Enable thinking for complex analysis without switching models. Good balance of capability and responsiveness.
OpenAI (note: o3-pro is forbidden — see Forbidden Models above)
- openai/o3: OpenAI's best reasoning model for code review. Good for state management analysis, algorithmic correctness, and methodical bug-hunting through code paths.
- openai/gpt-5-pro / openai/gpt-5.2-pro: OpenAI's flagship non-o-series models with reasoning. Good general-purpose deep analysis.
- openai/o4-mini: Reasoning model suitable for targeted deep analysis of specific files or functions.
- google/gemini-3-pro-preview: Google's latest and most capable model (1M context). Strong at cross-file analysis and understanding large codebases holistically.
- google/gemini-2.5-pro with thinking: Excellent at large-context analysis — can reason over many files simultaneously with its 1M token context window. Good for architectural consistency checks and understanding how changes ripple across a large codebase.
- google/gemini-2.5-flash with thinking: When you need reasoning over large context but want faster response times.
xAI
- xai/grok-4: xAI's strongest reasoning model. Good for getting a different perspective from a different model family on critical findings.
- xai/grok-4-fast / xai/grok-4-1-fast: Reasoning models with massive 2M context windows. Useful when you need to reason over an extremely large amount of code.
- xai/grok-code-fast-1: Code-specialized reasoning model (256k context). Consider for code-focused analysis where code understanding is more important than general reasoning breadth.
Bug Finding and Flaw Detection
When the goal is specifically to track down bugs or logical flaws in changes, these models excel:
- openai/o3: The o-series models are particularly strong at systematic bug-hunting. They methodically trace execution paths, track state through branches, and identify edge cases. Best choice when you suspect there's a bug and need to find it.
- anthropic/claude-opus-4-6: Excels at understanding developer intent and spotting where the implementation diverges from what was likely intended. Good at catching bugs that arise from misunderstanding an API or protocol.
- google/gemini-2.5-pro with thinking: Strong at finding bugs that manifest across file boundaries — where a change in one file breaks an assumption in another. The large context window helps hold the full picture.
- xai/grok-code-fast-1: Code-specialized model that can be effective for language-specific bug patterns.
Code Generation Models
When a review suggestion includes a concrete code fix or refactor, switching to a code-specialized model can produce better, more idiomatic suggestions:
- openai/gpt-5.1-codex / openai/gpt-5.2-codex: OpenAI's code-specialized models. Best choice when generating substantive code suggestions — refactors, rewrites, or proposed fixes. These models produce cleaner, more idiomatic code than general-purpose models.
- openai/codex-mini-latest: A lighter code generation model. Good for smaller, targeted code suggestions where speed matters more than handling complex multi-file refactors.
- xai/grok-code-fast-1: Fast code generation with strong code understanding (256k context). Useful when you need quick, code-focused suggestions and want to avoid the latency of larger models.
Use code generation models when your review finding warrants a concrete code example — a suggested fix, a refactored alternative, or an idiomatic replacement. For findings that are purely analytical (architectural concerns, design feedback), stick with reasoning or balanced models instead.
Use deep reasoning models for:
- Architectural analysis: How changes affect the broader codebase structure, dependency patterns, separation of concerns
- Security review: Authentication, authorization, injection vulnerabilities, cryptographic usage, secrets handling
- Concurrency and state: Race conditions, deadlocks, shared mutable state, transaction boundaries
- Complex algorithms: Mathematical correctness, edge cases in complex logic, performance characteristics
- Systems code: Rust, C, C++ — memory management, lifetime issues, unsafe blocks
Language and Framework Considerations
Some languages and domains benefit from specific models:
- Rust / C / C++: Memory safety, lifetimes, undefined behavior — use
oranthropic/claude-opus-4-6
for their strong reasoning about resource management.openai/o3
is also worth considering for Rust-specific patterns.xai/grok-code-fast-1 - TypeScript / JavaScript / React: Most models handle well.
oranthropic/claude-sonnet-4-5
are strong defaults. For complex state management (Redux, hooks, async flows), use reasoning models.google/gemini-2.5-pro - Python: Most models handle well. For ML/data pipeline code,
with thinking is strong given its deep Python training data.google/gemini-2.5-pro - SQL / database migrations: Schema changes and data integrity —
is strong at reasoning about relational constraints and migration ordering.openai/o3 - Infrastructure / IaC: Terraform, CloudFormation, Kubernetes — security
implications benefit from
oranthropic/claude-opus-4-6
for their security reasoning.openai/o3 - Shell scripts: Security-sensitive (injection, permissions) — use at least
with thinking enabled.anthropic/claude-sonnet-4-5 - Ruby / Rails:
andanthropic/claude-opus-4-6
have strong Ruby understanding. For Rails-specific patterns (N+1 queries, callback chains, ActiveRecord pitfalls), reasoning models help trace the implicit execution flow.openai/gpt-5 - Go: Strong support across most models. For concurrency review (goroutines,
channels, sync primitives), prefer
for its systematic path tracing.openai/o3
Subtask Strategy
The
task tool supports parallel execution with per-task model selection. This is
the key unlock for code review: run multiple review perspectives simultaneously,
each with a model suited to the task.
Parallel with per-task models
Each task in the
tasks array can specify its own model:
{ "tasks": [ { "task": "Review changed lines for bugs, logic errors, and edge cases.", "model": "openai/o3" }, { "task": "Analyze security implications of these changes.", "model": "anthropic/claude-opus-4-6" }, { "task": "Check architectural consistency with the broader codebase.", "model": "google/gemini-2.5-pro" } ] }
Tasks without a
model inherit the top-level model parameter, or the current
session model if neither is set.
Parallel with shared model
When all subtasks can use the same model, you can set a single top-level model:
{ "tasks": [ { "task": "Review the changed lines in isolation for bugs and issues." }, { "task": "Read the full files and check consistency with existing patterns." }, { "task": "Check test coverage for the changed code." } ], "model": "anthropic/claude-sonnet-4-5" }
Switching your own model
You don't always need subtasks to use a different model. You can switch your own model mid-review using
/model and continue working directly. This is useful when
you need specific expertise, or when you want to bring deeper reasoning to a
specific part of your analysis without the overhead of spawning a subtask.
When NOT to Use Subtasks
Before reaching for the
task tool, ask: "Can I do this with read, bash, or
other built-in tools directly?" If yes, do it directly. Subtasks are for
multi-step, context-heavy work — not for simple operations.
Never use subtasks for:
- Reading files: Use the
tool directly. A subtask spawns a fullread
process just to callpi
— adding seconds of overhead and failure risk for something that takes milliseconds.read - Running basic commands:
withbash
,git diff
,rg
, etc. is instant. Don't wrap these in subtasks.find - Gathering context before review: Read the files you need, run the commands you need, then do your analysis. This is normal tool use, not subtask work.
- Any single-tool operation: If the task boils down to one
orread
call, it doesn't need a subtask.bash
Do use subtasks for:
- Running multiple independent review analyses in parallel, each requiring many tool calls and producing substantial output
- Work that would consume significant context in the parent session (e.g., reading and analyzing 20+ files)
- Getting a different model's perspective on complex findings
Anti-pattern to avoid: Don't dispatch 5 parallel subtasks to read 5 files. Instead, read the 5 files yourself with 5
read calls (which can't fail due to
process spawn issues), then use subtasks only if you need parallel analysis of
the content.
When NOT to Switch Models
- If the user has explicitly requested a specific model, respect that choice
- If the diff is very small (under ~100 lines total), model switching adds overhead without meaningful benefit — a single balanced model handles it fine
- Don't switch to a weaker/faster model for trivial operations — if the operation is trivial enough for a weaker model, it's trivial enough to do directly without a subtask at all
- Don't use model overrides as a default — only specify a model when you have a clear reason that a different model would produce better results for that specific subtask