Cc-skills backtesting-py-oracle
backtesting.py configuration for SQL oracle validation and range bar pattern backtesting. Use when running backtesting.py against ClickHouse SQL results, configuring Backtest() constructor, handling overlapping trades, multi-position mode, rolling quantile NaN handling, trade sorting, or oracle gate validation. TRIGGERS - backtesting.py, Backtest(), hedging, exclusive_orders, multi-position, overlapping trades, oracle validation, SQL vs Python, trade comparison, entry price mismatch, signal count mismatch, rolling quantile NaN, ExitTime sort, stats._trades, gen600_strategy, champion_strategy, gen300_strategy, barrier setup.
git clone https://github.com/terrylica/cc-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/terrylica/cc-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/quant-research/skills/backtesting-py-oracle" ~/.claude/skills/terrylica-cc-skills-backtesting-py-oracle && rm -rf "$T"
plugins/quant-research/skills/backtesting-py-oracle/SKILL.mdbacktesting.py Oracle Validation for Range Bar Patterns
Configuration and anti-patterns for using backtesting.py to validate ClickHouse SQL sweep results. Ensures bit-atomic replicability between SQL and Python trade evaluation.
Companion skills:
clickhouse-antipatterns (SQL correctness, AP-16) | sweep-methodology (sweep design) | rangebar-eval-metrics (evaluation metrics)
Validated: Gen600 oracle verification (2026-02-12) — 3 assets, 5 gates, ALL PASS.
Self-Evolving Skill: This skill improves through use. If instructions are wrong, parameters drifted, or a workaround was needed — fix this file immediately, don't defer. Only update for real, reproducible issues.
Critical Configuration (NEVER omit)
from backtesting import Backtest bt = Backtest( df, Strategy, cash=100_000, commission=0, hedging=True, # REQUIRED: Multiple concurrent positions exclusive_orders=False, # REQUIRED: Don't auto-close on new signal )
Why: SQL evaluates each signal independently (overlapping trades allowed). Without
hedging=True, backtesting.py skips signals while a position is open, producing fewer trades than SQL. This was discovered when SOLUSDT produced 105 Python trades vs 121 SQL trades — 16 signals were silently skipped.
Anti-Patterns (Ordered by Severity)
BP-01: Missing Multi-Position Mode (CRITICAL)
Symptom: Python produces fewer trades than SQL. Gate 1 (signal count) fails.
Root Cause: Default
exclusive_orders=True prevents opening new positions while one is active.
Fix: Always use
hedging=True, exclusive_orders=False.
BP-02: ExitTime Sort Order (CRITICAL)
Symptom: Entry prices appear mismatched (Gate 3 fails) even though both SQL and Python use the same price source.
Root Cause:
stats._trades is sorted by ExitTime, not EntryTime. When overlapping trades exit in a different order than they entered, trade[i] no longer maps to signal[i].
Fix:
trades = stats._trades.sort_values("EntryTime").reset_index(drop=True)
BP-03: NaN Poisoning in Rolling Quantile (CRITICAL)
Symptom: Cross-asset tests fail with far fewer Python trades. Feature quantile becomes NaN and propagates forward indefinitely.
Root Cause:
np.percentile with NaN inputs returns NaN. If even one NaN feature value enters the rolling window, all subsequent quantiles become NaN, making all subsequent filter comparisons fail.
Fix: Skip NaN values when building the signal window:
def _rolling_quantile_on_signals(feature_arr, is_signal_arr, quantile_pct, window=1000): result = np.full(len(feature_arr), np.nan) signal_values = [] for i in range(len(feature_arr)): if is_signal_arr[i]: if len(signal_values) > 0: window_data = signal_values[-window:] result[i] = np.percentile(window_data, quantile_pct * 100) # Only append non-NaN values (matches SQL quantileExactExclusive NULL handling) if not np.isnan(feature_arr[i]): signal_values.append(feature_arr[i]) return result
BP-04: Data Range Mismatch (MODERATE)
Symptom: Different signal counts between SQL and Python for assets with early data (BNB, XRP).
Root Cause:
load_range_bars() defaults to start='2020-01-01' but SQL has no lower bound.
Fix: Always pass
start='2017-01-01' to cover all available data.
BP-05: Margin Exhaustion with Overlapping Positions (MODERATE)
Symptom: Orders canceled with insufficient margin. Fewer trades than expected.
Root Cause: With
hedging=True and default full-equity sizing, overlapping positions exhaust available margin.
Fix: Use fixed fractional sizing:
self.buy(size=0.01) # 1% equity per trade
BP-06: Signal Timestamp vs Entry Timestamp (LOW)
Symptom: Gate 2 (timestamp match) fails because SQL uses signal bar timestamps while Python uses entry bar timestamps.
Root Cause: SQL outputs the signal detection bar's
timestamp_ms. Python's EntryTime is the fill bar (next bar after signal). These differ by 1 bar.
Fix: Record signal bar timestamps in the strategy's
next() method:
# Before calling self.buy() self._signal_timestamps.append(int(self.data.index[-1].timestamp() * 1000))
5-Gate Oracle Validation Framework
| Gate | Metric | Threshold | What it catches |
|---|---|---|---|
| 1 | Signal Count | <5% diff | Missing signals, filter misalignment |
| 2 | Timestamp Match | >95% | Timing offset, warmup differences |
| 3 | Entry Price | >95% | Price source mismatch, sort ordering |
| 4 | Exit Type | >90% | Barrier logic differences |
| 5 | Kelly Fraction | <0.02 | Aggregate outcome alignment |
Expected residual: 1-2 exit type mismatches per asset at TIME barrier boundary (bar 50). SQL uses
fwd_closes[max_bars], backtesting.py closes at current bar price. Impact on Kelly < 0.006.
Strategy Architecture: Single vs Multi-Position
| Mode | Constructor | Use Case | Position Sizing |
|---|---|---|---|
| Single-position | (default) | Champion 1-bar hold | Full equity |
| Multi-position | | SQL oracle validation | Fixed fractional () |
Multi-Position Strategy Template
class Gen600Strategy(Strategy): def next(self): current_bar = len(self.data) - 1 # 1. Register newly filled trades and set barriers for trade in self.trades: tid = id(trade) if tid not in self._known_trades: self._known_trades.add(tid) self._trade_entry_bar[tid] = current_bar actual_entry = trade.entry_price if self.tp_mult > 0: trade.tp = actual_entry * (1.0 + self.tp_mult * self.threshold_pct) if self.sl_mult > 0: trade.sl = actual_entry * (1.0 - self.sl_mult * self.threshold_pct) # 2. Check time barrier for each open trade for trade in list(self.trades): tid = id(trade) entry_bar = self._trade_entry_bar.get(tid, current_bar) if self.max_bars > 0 and (current_bar - entry_bar) >= self.max_bars: trade.close() self._trade_entry_bar.pop(tid, None) # 3. Check for new signal (no position guard — overlapping allowed) if self._is_signal[current_bar]: self.buy(size=0.01)
Data Loading
from data_loader import load_range_bars df = load_range_bars( symbol="SOLUSDT", threshold=1000, start="2017-01-01", # Cover all available data end="2025-02-05", # Match SQL cutoff extra_columns=["volume_per_trade", "lookback_price_range"], # Gen600 features )
HTML Equity Plot: Y-Axis Auto-Fit on Zoom (BP-07 to BP-10)
backtesting.py generates Bokeh HTML plots via
bt.plot(). By default, the y-axis is fixed — zooming on the x-axis does NOT rescale the y-axis to fit visible data. This makes it impossible to inspect zoomed-in regions of equity curves that span 4+ orders of magnitude.
Reference implementation:
scripts/gen800/plotting.py in opendeviationbar-patterns.
BP-07: NEVER Use LogScale for Equity (CRITICAL)
Symptom: Equity panel y-axis doesn't auto-fit on zoom. JS callbacks fail silently.
Root Cause: Bokeh's
LogScale breaks CustomJS y-range callbacks in multi-panel linked x_range layouts. The js_on_change callback fires but y_range.start/y_range.end assignments are ignored by the LogScale renderer.
Proven via POC: LogScale + JS callback works in single-panel mode but fails when 5 panels share a linked x_range.
Fix: Transform equity data to
log10() in Python, display on linear scale with a CustomJSTickFormatter that shows 10^tick as readable values:
import numpy as np from bokeh.models import CustomJSTickFormatter, Range1d # Transform data raw = np.asarray(src.data["equity"], dtype=float) raw = np.where(raw > 0, raw, 1e-10) src.data["equity"] = np.log10(raw).tolist() # Custom tick formatter (shows 1%, 100%, 10K%, 1M%) child.yaxis[0].formatter = CustomJSTickFormatter(code=""" const v = Math.pow(10, tick); if (v >= 1e6) return (v/1e6).toFixed(1) + 'M%'; if (v >= 1e3) return (v/1e3).toFixed(0) + 'K%'; if (v >= 1) return v.toFixed(0) + '%'; return v.toFixed(2) + '%'; """) # MUST set initial y_range explicitly (backtesting.py leaves it as NaN) valid = eq[np.isfinite(eq)] pad = (valid.max() - valid.min()) * 0.05 child.y_range = Range1d(start=valid.min() - pad, end=valid.max() + pad)
BP-08: Panel-Aware Column Matching (CRITICAL)
Symptom: Drawdown/P&L panels go blank when zooming. Shows millions of percent.
Root Cause: Multiple panels share the same
ColumnDataSource (backtesting.py optimization). The Equity panel's Line renderer and the Drawdown panel's Line renderer both reference a source containing equity, drawdown, High, Low, etc. A naive "find first y column" approach picks equity for the Drawdown panel.
Fix: Match by
panel_label FIRST, then by column name:
panel_label = child.yaxis[0].axis_label or "" # Label-first matching (panels share data sources!) if "Drawdown" in panel_label and "drawdown" in d: hi_col = lo_col = "drawdown" elif "Equity" in panel_label and "equity" in d: hi_col = lo_col = "equity" elif "High" in d and "Low" in d: # OHLC hi_col, lo_col = "High", "Low"
BP-09: JS Callback Pattern for Y-Axis Auto-Fit (PROVEN)
Attach a
CustomJS callback to x_range.js_on_change for each panel. The callback scans visible data and sets y_range directly:
from bokeh.models import CustomJS cb = CustomJS( args={"source": src, "yr": child.y_range, "x_col": "index", "y_col": "equity"}, code=""" const xs = source.data[x_col]; const ys = source.data[y_col]; const x0 = cb_obj.start, x1 = cb_obj.end; let lo = Infinity, hi = -Infinity; for (let i = 0; i < xs.length; i++) { if (xs[i] >= x0 && xs[i] <= x1 && isFinite(ys[i])) { if (ys[i] < lo) lo = ys[i]; if (ys[i] > hi) hi = ys[i]; } } if (isFinite(lo) && isFinite(hi) && lo < hi) { const pad = (hi - lo) * 0.05 || 0.001; yr.start = lo - pad; yr.end = hi + pad; } """, ) child.x_range.js_on_change("start", cb) child.x_range.js_on_change("end", cb)
BP-10: DataRange1d(only_visible=True) Is Unreliable
Symptom:
DataRange1d(only_visible=True) works for simple cases but fails with:
- LogScale panels (y-range computed in wrong space)
- VBar renderers (candlesticks — bounds not tracked correctly)
- Shared data sources across panels
Fix: Always use the JS callback pattern (BP-09) instead of
DataRange1d(only_visible=True).
Panel Data Source Reference (backtesting.py internals)
| Panel | Label | Renderer | Y Column | Source |
|---|---|---|---|---|
| Equity | | Patch (), Line () | | Shared OHLC source |
| Drawdown | | Line (), Scatter (peak) | | Shared OHLC source |
| P/L | | Scatter (), MultiLine | or | Separate trade source |
| OHLC | (no label) | Segment, VBar | / | Shared OHLC source |
| Volume | | VBar | (top) | Shared OHLC source |
Project Artifacts (rangebar-patterns repo)
| Artifact | Path |
|---|---|
| Oracle comparison script | |
| Gen600 strategy (reference) | |
| SQL oracle query template | |
| Oracle validation findings | |
| Backtest CLAUDE.md | |
| ClickHouse AP-16 | |
| Fork source | |
Post-Execution Reflection
After this skill completes, check before closing:
- Did the command succeed? — If not, fix the instruction or error table that caused the failure.
- Did parameters or output change? — If the underlying tool's interface drifted, update Usage examples and Parameters table to match.
- Was a workaround needed? — If you had to improvise (different flags, extra steps), update this SKILL.md so the next invocation doesn't need the same workaround.
Only update if the issue is real and reproducible — not speculative.