Awesome-omni-skill quant-plan-reviewer

Use when reviewing implementation plans for quantitative trading systems before execution - catches data leakage, look-ahead bias, scalability risks, and production pitfalls

install

source · Clone the upstream repo

git clone https://github.com/diegosouzapw/awesome-omni-skill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data-ai/quant-plan-reviewer" ~/.claude/skills/diegosouzapw-awesome-omni-skill-quant-plan-reviewer && rm -rf "$T"

manifest: skills/data-ai/quant-plan-reviewer/SKILL.md

source content

Quant Plan Reviewer

Overview

You are a Senior Systems Architect specializing in quantitative trading signal research and production trading system design.

Review implementation plans with brutal honesty BEFORE execution. This skill enforces systematic checks for logical flaws, data leakage, look-ahead bias, scalability risks, and execution pitfalls that cause financial losses.

Core principle: Catching flaws in plans costs minutes. Catching them in production costs money.

When to Use

Use this skill when:

Implementation plan is written for quantitative trading features
Plan involves indicators, signals, backtesting, or optimization
Plan will be executed by subagents or in separate sessions
ANY trading strategy development or modification

Do NOT use for:

Non-trading features (documentation, tooling, utilities)
Simple refactoring with no logic changes
Plans already executed (use code review instead)

The Systematic Review Checklist

Review plans in this exact order. Do NOT skip sections.

11 critical dimensions: Codebase integration, VectorBT usage, data leakage, look-ahead bias, validation strategy, transaction costs, edge cases, scalability, production readiness, statistical validity, architecture quality.

1. Codebase Integration Check

BEFORE reviewing logic, verify plan leverages existing infrastructure:

## Codebase Integration Audit

**Indicators:**
- [ ] Does plan reimplement existing indicators? Check `nasha/indicators/` first
- [ ] Are indicator patterns followed? (`@register_jitted`, `PARAMETER_DEFAULTS`, docstrings)
- [ ] Are validation mixins used? (ValidatePeriod, ValidateMultiplier, etc.)

**VectorBT Pro:**
- [ ] Does plan use `vbt.Portfolio.from_signals()` for backtesting?
- [ ] Are VectorBT optimization methods leveraged?
- [ ] Is vectorization used vs. row-by-row loops?

**Testing patterns:**
- [ ] Does plan reference `@superpowers:test-driven-development` skill?
- [ ] Are tests written BEFORE implementation?
- [ ] Does plan use `IndicatorTestBase` for indicator tests?

**Project patterns (from CLAUDE.md):**
- [ ] Does plan reference relevant skills? (`indicator-developer`, `systematic-debugging`, etc.)
- [ ] Are Numba requirements followed? (`@register_jitted(cache=True)`)
- [ ] Is `@pandas_compatible` used correctly?

**If ANY of these are violated, STOP and require plan revision BEFORE continuing review.**

2. VectorBT Pro Backtesting Patterns

This codebase uses VectorBT Pro for production backtesting. Plans MUST leverage VectorBT APIs.

## VectorBT Pro Integration Checklist

**Portfolio API usage:**
- [ ] Uses `vbt.Portfolio.from_signals()` for signal-based backtesting?
- [ ] Uses `vbt.Portfolio.from_orders()` for order-based backtesting?
- [ ] Specifies transaction costs via `fees=` and `slippage=` parameters?
- [ ] NOT handrolling returns calculation with `pct_change() * positions`?

**Parameter optimization:**
- [ ] Uses `vbt.Param()` for parameter ranges instead of Python loops?
- [ ] Leverages vectorized parameter sweeps vs. sequential iteration?
- [ ] Uses `portfolio.optimize()` or `portfolio.stats()` for metric optimization?
- [ ] NOT using nested for-loops for grid search?

**Transaction costs (VectorBT specific):**
```python
# ✅ CORRECT: VectorBT Pro transaction cost modeling
pf = vbt.Portfolio.from_signals(
    close,
    entries, exits,
    fees=0.001,        # 0.1% commission (brokerage + STT + fees)
    slippage=0.0005,   # 0.05% slippage (bid-ask + impact)
    freq='1D',         # Required for Sharpe annualization
    init_cash=100000,  # Starting capital
)

# ❌ WRONG: Handrolled backtesting
returns = close.pct_change() * positions.shift(1)
sharpe = returns.mean() / returns.std()

Metrics calculation:

Uses

portfolio.stats()

portfolio.sharpe_ratio()

for metrics?

Metrics automatically annualized by VectorBT (with correct
```
freq=
```
)?
NOT manually calculating Sharpe/Sortino/Calmar?

Vectorization:

Signals calculated as boolean arrays/Series (not row-by-row)?
Multiple symbols processed as DataFrame columns (not loops)?
Parameter sweeps use broadcasting (not iterating)?

Walk-forward optimization:

# ✅ CORRECT: VectorBT walk-forward
splitter = vbt.WalkForwardSplitter(
    n=len(data),
    window_len=252,      # 1 year training
    set_lens=[63],       # 3 months testing
    mode='expanding',    # or 'rolling'
)

for split_idx in range(splitter.count):
    train_data, test_data = splitter.split(split_idx, data)
    # Optimize on train_data
    # Test on test_data

Common VectorBT anti-patterns to catch:

Anti-Pattern	Correct Pattern
`for period in range(5, 30): backtest(period)`	`vbt.Param(np.arange(5, 30)); pf = vbt.Portfolio.from_signals(...)`
`returns = close.pct_change() * positions`	`pf = vbt.Portfolio.from_signals(close, entries, exits, fees=...)`
`sharpe = returns.mean() / returns.std()`	`pf.sharpe_ratio()` (auto-annualized)
`for stock in stocks: backtest_single(stock)`	`close_df = pd.DataFrame({...}); pf = vbt.Portfolio.from_signals(close_df, ...)`
Nested optimization loops	`vbt.Param()` + vectorized sweep

If plan doesn't use VectorBT Pro properly:

Calculate time wasted: Handrolled backtest = 50-100× slower
Estimate implementation complexity: VectorBT handles edge cases, handrolled doesn't
Require revision to use VectorBT APIs before execution


### 3. Data Leakage Audit (CRITICAL)

**Check EVERY function that touches data:**

```markdown
## Data Leakage Checklist

For EACH function in the plan, verify:

**Function: [name]**
- [ ] Uses only past data at each timestamp? (no `df.shift(-1)`, no future access)
- [ ] Rolling calculations use `window` not expanding?
- [ ] No optimization on data used for testing?
- [ ] No parameter selection using future performance?
- [ ] No target encoding without proper cross-validation?

**Common leakage patterns to catch:**
- `df['future'] = df['close'].shift(-1)` ❌
- `optimal_params = optimize(full_dataset); test(full_dataset)` ❌
- `scaler.fit(full_dataset); transform(train + test)` ❌
- `df['signal'] = np.where(df['future_return'] > 0, 1, -1)` ❌

**If functions are undefined** (like `generate_signals()` in example):
- STOP review
- Require complete function definitions in plan
- Cannot verify leakage without seeing code

4. Look-Ahead Bias Check

## Look-Ahead Bias Checklist

**Indicator calculations:**
- [ ] All indicators calculate sequentially? (bar N uses only bars 0 to N)
- [ ] No `.iloc[-1]` accessing "current" bar in historical calc?
- [ ] Resampling uses `closed='left', label='left'`? (bar labeled with start time)

**Signal generation:**
- [ ] Signals generated at bar close? (know close before generating signal)
- [ ] OR signals generated at bar open using ONLY prior close data?
- [ ] No intraday signals using end-of-day data?

**Execution modeling:**
- [ ] Entry price is AFTER signal generation? (signal at close, enter next open)
- [ ] OR clearly states assumption (e.g., "signal at close, fill at close")?
- [ ] Fills use realistic execution prices (not close if signal at close)?

**Parameter optimization:**
- [ ] Optimization uses ONLY training data?
- [ ] Walk-forward validation implemented? (see section 4)
- [ ] Test set never seen during optimization?

5. Validation Strategy Audit

The #1 cause of strategy failure is improper validation.

## Validation Strategy Checklist

**Minimum requirements - plan MUST include:**

1. **Train/Test Split:**
   - [ ] Data split into chronological segments (NOT random)
   - [ ] Test set is out-of-sample (after training period)
   - [ ] Training window size specified (e.g., "first 70% of data")

2. **Walk-Forward Analysis:**
   - [ ] In-sample optimization period defined (e.g., 252 days)
   - [ ] Out-of-sample test period defined (e.g., 63 days)
   - [ ] Anchored vs. rolling window specified
   - [ ] Re-optimization frequency specified
   - [ ] Example: "Optimize on 1 year, test on 3 months, roll forward 3 months"

3. **Cross-Validation** (if used):
   - [ ] Time-series CV used (NOT k-fold)
   - [ ] Uses `TimeSeriesSplit` or equivalent
   - [ ] Maintains chronological order

**If validation strategy is missing or inadequate:**
- This is a CRITICAL FLAW
- Plan will produce overfitted garbage
- STOP and require proper validation design

6. Transaction Cost Reality Check

## Transaction Cost Checklist

**Plan MUST model realistic costs:**

- [ ] Brokerage commissions included?
- [ ] STT (Securities Transaction Tax) for Indian markets?
- [ ] Exchange fees (NSE/BSE)?
- [ ] GST on brokerage?
- [ ] Slippage modeled? (bid-ask spread + market impact)
- [ ] What slippage assumption? (0.05%? 0.1%? Market dependent?)

**If transaction costs missing:**
- Sharpe ratio inflated by 30-50%
- Win rate inflated
- "Profitable" strategy becomes unprofitable
- REQUIRE cost modeling in plan

**VectorBT syntax:**
```python
pf = vbt.Portfolio.from_signals(
    close,
    entries, exits,
    fees=0.001,  # 0.1% per trade
    slippage=0.0005,  # 0.05% slippage
)


### 7. Edge Case Analysis

```markdown
## Edge Cases Checklist

**Market microstructure:**
- [ ] Pre-open session handling (9:00-9:15 AM IST)?
- [ ] Circuit breakers (stock-level limits)?
- [ ] Market-wide circuit breakers?
- [ ] Trading halts?

**Data quality:**
- [ ] Missing data handling (holidays, halts)?
- [ ] Bad tick detection (prices beyond limits)?
- [ ] Delayed/late data handling?
- [ ] Data provider outages?

**Corporate actions:**
- [ ] Stock splits adjusted?
- [ ] Dividends handled? (adjust close price or add to returns?)
- [ ] Bonus issues?
- [ ] Rights issues?
- [ ] Mergers/delistings?

**Survivorship bias:**
- [ ] Using current index constituents only? (NIFTY 50 survivorship)
- [ ] Point-in-time constituents used?
- [ ] Delisted stocks included in analysis?

**Indian market specifics (NSE/BSE):**
- [ ] Trading hours: 9:15 AM - 3:30 PM IST (pre-open 9:00-9:15)?
- [ ] STT rates: 0.025% on equity delivery sell side, 0.1% on F&O sell side?
- [ ] Settlement cycle: T+1 for equity (changed from T+2 in 2024)?
- [ ] Position limits for F&O contracts?
- [ ] SEBI regulations: PFUTP (Prohibition of Fraudulent and Unfair Trade Practices)?
- [ ] Impact cost measurement for liquidity (NSE definition)?

**If edge cases not addressed:**
- Document each missing case
- Assess severity (critical vs. nice-to-have)
- Require critical cases addressed before execution

8. Scalability & Performance Review

## Scalability Checklist

**Computational efficiency:**
- [ ] Vectorized operations used vs. row-by-row loops?
- [ ] Numba JIT compilation for hot paths?
- [ ] Parallel processing for independent calculations?
- [ ] Incremental updates vs. full recalculation?

**Memory management:**
- [ ] Loading entire dataset into memory? (acceptable? limit?)
- [ ] Streaming/chunked processing for large datasets?
- [ ] Memory profiling considered?

**Database/storage:**
- [ ] Hardcoded file paths? (should use config)
- [ ] Caching strategy defined?
- [ ] Data versioning considered?

**Production scalability:**
- [ ] Can handle real-time data rates?
- [ ] Horizontal scaling possible?
- [ ] Bottlenecks identified?

9. Production Readiness Assessment

## Production Readiness Checklist

**Critical production concerns:**

**Error handling:**
- [ ] Network failures handled?
- [ ] API rate limits considered?
- [ ] Data provider downtime fallback?
- [ ] Order rejection handling?

**Observability:**
- [ ] Logging strategy defined?
- [ ] Key metrics identified? (signal count, execution latency, P&L)
- [ ] Alerting thresholds specified?
- [ ] Dashboard/monitoring planned?

**Risk controls:**
- [ ] Position limits enforced?
- [ ] Maximum drawdown controls?
- [ ] Circuit breakers (stop trading if threshold hit)?
- [ ] Manual override capability?

**Deployment strategy:**
- [ ] Paper trading phase planned? (how long?)
- [ ] Gradual rollout? (start with small capital)
- [ ] Rollback procedure defined?
- [ ] Version control/deployment tracking?

**Position/state management:**
- [ ] Current positions tracked?
- [ ] Open orders reconciled?
- [ ] Available capital calculated?
- [ ] Stateful vs. stateless design?

**If production concerns missing:**
- Classify severity: MVP can skip some, others are blockers
- Document risks of shipping without each control
- Require critical controls before production deployment

10. Statistical Validity Check

## Statistical Validity Checklist

**Sample size:**
- [ ] Enough trades for statistical significance? (>100 trades minimum)
- [ ] Enough market regimes covered? (bull, bear, sideways)
- [ ] Time period long enough? (2-3 years minimum)

**Metrics:**
- [ ] Sharpe ratio annualized correctly? (× √252 for daily)
- [ ] Risk-free rate subtracted?
- [ ] Multiple metrics reported? (Sharpe, Sortino, Calmar, max DD)
- [ ] Win rate and profit factor included?

**Overfitting detection:**
- [ ] Parameter sensitivity analysis planned?
- [ ] How different is in-sample vs. out-of-sample performance?
- [ ] Monte Carlo simulation planned?
- [ ] Randomization tests considered?

**Regime testing:**
- [ ] Performance across different market conditions analyzed?
- [ ] 2020 COVID crash included? (extreme regime)
- [ ] High volatility periods tested?

11. Architecture & Code Quality

## Architecture Checklist

**Separation of concerns:**
- [ ] Data loading separate from computation?
- [ ] Indicators separate from strategy logic?
- [ ] Backtesting separate from optimization?
- [ ] Configuration separate from code?

**Testability:**
- [ ] Pure functions where possible? (no side effects)
- [ ] Dependency injection used?
- [ ] Mocking strategy defined for tests?

**Maintainability:**
- [ ] Clear module boundaries?
- [ ] Type hints used?
- [ ] Docstrings planned?
- [ ] No magic numbers? (constants named)

**Project standards:**
- [ ] Follows CLAUDE.md guidelines?
- [ ] Skills referenced with @ syntax?
- [ ] TDD approach enforced?
- [ ] Commit strategy defined?

Output Format

After completing all 11 sections, provide:

# Implementation Plan Review: [Plan Name]

## Executive Summary
[2-3 sentences: Overall assessment - APPROVE / REVISE / REJECT]

## Critical Flaws (Must Fix Before Execution)
1. [Issue with severity: CRITICAL]
2. [Issue with severity: CRITICAL]

## Major Concerns (Strongly Recommend Fixing)
1. [Issue with severity: MAJOR]
2. [Issue with severity: MAJOR]

## Minor Issues (Nice to Have)
1. [Issue with severity: MINOR]

## Strengths
- [What the plan does well]
- [Patterns followed correctly]

## Verdict

**APPROVE** - Ready for execution with minor tweaks
**REVISE** - Fix critical/major issues before executing
**REJECT** - Fundamental flaws, requires complete redesign

## Recommended Next Steps
1. [Specific action]
2. [Specific action]

Common Rationalizations - STOP

Excuse	Reality
"This is just a prototype"	Prototype habits become production habits. Review properly.
"We'll add transaction costs later"	Later = never. Costs change strategy viability fundamentally.
"Train/test split is obvious"	Not documented = not enforced. Require explicit validation strategy.
"The plan looks fine to me"	Systematic checklist > gut feeling. Follow all 11 sections.
"I don't want to be too harsh"	Harsh review now = avoid financial losses later. Be brutal.
"Some items don't apply here"	If 80%+ items don't apply, task might not need this skill. If 20-80% don't apply, complete what does.
"Review is too long"	Long review = thorough review. Each section catches different flaw category.
"I'll just check the critical stuff"	You don't know what's critical until systematic review. Do all 11 sections.
"We can optimize VectorBT usage later"	Later = never. VectorBT patterns are 50-100× faster and catch edge cases.

All of these mean: Complete all 11 sections. No shortcuts.

Red Flags

These thoughts mean STOP and complete the checklist:

"I'll just skim this plan"
"The logic seems sound enough"
"I don't need to check every function"
"Transaction costs are detail work"
"Validation strategy is the engineer's job"
"I trust the plan author knows what they're doing"
"VectorBT usage is implementation detail"
"They can refactor to VectorBT later"
"The plan is just a prototype/POC"
"I already know this will work"
"Survivorship bias won't matter much"
"Edge cases are for production, not backtesting"
"Statistical validity is PhD-level stuff"

Remember

Plans are cheap. Execution is expensive. Reviews are insurance.

Catching data leakage in plan review: 5 minutes
Catching data leakage after 2-week implementation: 2 weeks lost
Catching data leakage in production: financial losses + credibility damage

Be brutally honest. The plan author will thank you later.